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Abstract — A set of physics models and parameters pertaining 
to the simulation of proton energy deposition in matter are 
evaluated in the energy range up to approximately 65 MeV, based 
on their implementations in the Geant4 toolkit. The analysis 
assesses several features of the models and the impact of their 
associated epistemic uncertainties, i.e. uncertainties due to lack 
of knowledge, on the simulation results. Possible systematic 
effects deriving from uncertainties of this kind are highlighted; 
their relevance in relation to the application environment and 
different experimental requirements are discussed, with emphasis 
on the simulation of radiotherapy set-ups. By documenting 
quantitatively the features of a wide set of simulation models and 
the related intrinsic uncertainties affecting the simulation results, 
this analysis provides guidance regarding the use of the concerned 
simulation tools in experimental applications; it also provides 
indications for further experimental measurements addressing 
the sources of such uncertainties. 

Index Terms — Monte Carlo, simulation, Geant4, hadron ther- 
apy- 

I, Introduction 

THE simulation of the energy deposited by protons in 
matter is relevant to various experimental applications; 
radiotherapeutical applications exploit its peculiar pattern prior 
to stopping, exhibiting the characteristic "Bragg peak", to 
deliver a well localized dose to the tumor area [1]. 

Several applications of general purpose Monte Carlo sys- 
tems, like MCNP [2]-[4], GEANT 3 [5], Geant4 [6], [7], 
SHIELD-HIT [8], [9], FLUKA [10], [11] and PHITS [12] 
are documented in the literature concerning this topic, in 
hadron therapy as well as in other fields. A variety of physics 
options - theoretical models, evaluated data compilations and 
values of relevant physical parameters - is available in these 
Monte Carlo codes to model the electromagnetic and nuclear 
interactions of protons, and of their secondary products. While 
the software implementations are specific to each Monte Carlo 
code, the underlying physics modeling approaches and data 
compilations are often common to various simulation systems. 

Some of the physics models and parameters used in the 
simulation of proton interactions with matter are affected by 
epistemic uncertainties [13], i.e uncertainties due to lack of 
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knowledge. They may originate from various sources [14], 
such as incomplete understanding of fundamental physics 
processes, or practical inability to treat them thoroughly, 
non-existent or conflicting experimental data for a physical 
parameter or model, or the application of a physics model 
beyond the experimental conditions in which its validity has 
been demonstrated (e.g. at lower or higher energies, or with 
different target materials). 

The role of epistemic uncertainties in the software verifica- 
tion and validation process has been the object of research in 
the context of simulation based on deterministic methods [13]; 
these investigations are motivated by the rigorous risk analysis 
required by some sensitive applications. Limited attention has 
been devoted so far to the role of epistemic uncertainties in 
Monte Carlo simulation in particle and nuclear physics, and 
related experimental fields. This paper addresses this topic 
by illustrating it in a concrete experimental use case: the 
simulation of proton depth dose to water for radiotherapy 
applications. The simulation configuration involves a realistic 
model of a therapeutical proton beam line and beam energies 
of approximately 65 MeV; this use case is representative 
of experimental environments for the treatment of ocular 
melanoma [15]— [18]. 

Due to their intrinsic nature, related to lack of knowledge, 
epistemic uncertainties are difficult to quantify [19]. Although 
the characterization of epistemic uncertainty contributions is 
needed for many of the issues that feed the reliability model 
of complex systems, there is no generally accepted method 
of measuring epistemic uncertainties and their contribution to 
reliability estimations. A variety of mathematical formalisms 
[20] has been developed for this purpose; nevertheless, some 
of the techniques adopted in the context of deterministic sim- 
ulations, like interval analysis and applications of Dempster- 
Shafer theory of evidence [21], are not always directly applica- 
ble in identical form to the treatment of epistemic uncertainties 
in Monte Carlo simulations. 

Sensitivity analysis [19] is a tool for exploring how different 
uncertainties, including epistemic ones, influence the model 
output [22]. This approach is adopted in the study described 
here: the paper identifies a set of epistemic uncertainties in 
physics modeling pertinent to the problem domain, documents 
their impact on the simulation results, and discusses their 
potentiality to produce systematic effects in relation to the 
characteristics of the application environment. 

Typically, in statistical analyses epistemic uncertainty is 
represented as a set of discrete possible or plausible choices 
(e.g. model choices) [23]. The environment for this kind of 
study has been realized in the context of a Geant4-based 



application; the characteristic of Geant4 as a toolkit, encom- 
passing a wide variety of physics models, along with the 
feature of polymorphism characterizing the object oriented 
programming paradigm, allow the configuration of the sim- 
ulation with a large number of different physics options in 
the same software environment. This versatility makes Geant4 
a convenient playground to evaluate the effects of a number 
of physics alternatives on the experimental use case under 
study. The sensitivity analysis documented in this paper, which 
examines the response of the system to a wide set of modeling 
approaches, plays a conceptually similar role to the interval 
analysis method applied in deterministic simulation, where 
parameters subject to epistemic uncertainties are varied within 
bounds. As mentioned in the previous paragraph, sensitivity 
analysis as applied to the context described in the following 
sections contributes to identify and quantify possible system- 
atic effects in the simulation; it cannot infer anything about the 
validity of any of the physics models, for which experimental 
data would be needed. 

The physics modeling options available in Geant4 for the 
use case under study are broadly representative of the body 
of knowledge in the problem domain; the analysis described 
in this paper, albeit performed in the context of a specific 
Monte Carlo system, provides elements of interest for other 
simulation environments as well. 

Epistemic uncertainties are in principle reducible [13]; the 
analysis documented in this paper identifies areas where 
experimental measurements could reduce them, and highlights 
their requirements of accuracy and other features to improve 
the knowledge embedded in the current simulation models. 

Preliminary assessments relevant to this study were reported 
in [24]. 

II. GEANT4-BASED SIMULATION OF PROTON DEPTH-DOSE 
PROFILES 

The Geant4 toolkit includes various physics modelling 
options relevant to the application domain considered in this 
paper; they concern stopping powers, multiple scattering, cross 
sections and final state models of elastic and inelastic nuclear 
interactions of the primary protons, as well as a variety 
of electromagnetic and hadronic models for the secondary 
particles resulting from proton interactions with matter. 

Several Geant4-based simulations of proton therapy set- 
ups, like [25]- [35], have documented satisfactory agreement 
with experimental depth dose measurements in various con- 
figurations of beam lines and detectors. Some open issues 
remain, which are generally shared by simulations based on 
other Monte Carlo systems as well, due to common physics 
modeling approaches in the codes and similar practices in the 
experimental domain. 

There is evidence in the literature of different features 
produced by Geant4 physics models in the energy range below 
100 MeV [36]-[40]; the discrimination of their accuracy is 
made difficult by the still incomplete software validation, 
which is often hindered by the limited availability of exper- 
imental data, or their controversial characteristics. The same 
problem affects other Monte Carlo codes as well. Furthermore, 



some shortcomings are present in the comparisons of proton 
therapy simulations with experimental data. 

The physics configuration used in the simulation and the 
Monte Carlo kernel version are either undocumented, or 
incompletely documented, in a number of references; therefore 
it is not always possible to relate the results documented in the 
literature to the physics options and software implementations 
which produced them, thus hindering the reproducibility of the 
results. The comparison between simulated and experimental 
data is limited to qualitative appraisal in most cases; the lack 
of statistical analysis in many articles prevents the reader 
from appraising the significance of the compatibility between 
simulation and experimental data, and the relative merits of 
the simulation models in comparison to measurements taken 
in different experimental configurations and characterized by 
different uncertainties. 

It is a feature of proton therapy simulations that the beam 
parameters, namely the beam energy and energy spread, are 
not known in typical experimental set-ups of this domain with 
sufficient accuracy to base the simulation on their nominal 
values [27]. The precise knowledge of the beam energy is 
not critical to clinical applications; in that context what is 
of interest is the proton range, which can be measured very 
accurately. In common practice, the beam parameters input 
to the simulation are adjusted so that the simulation results 
best fit the measured depth dose profile [25], [27], [30], 
[33], [41], [42]; this procedure affects the significance of 
further comparisons between simulated and experimental data. 
Experimental techniques have been developed to measure the 
energy of a proton beam from radiotherapy accelerators with 
greater precision [43], but they are not commonly exploited in 
connection with simulation validation studies. 

The comparison of simulated and experimental Bragg peak 
profiles is also sensitive to the normalization procedures, 
which are often applied in experimental practice to relate 
simulated and measured data; this topic is discussed in detail 
in [44]. 

These shortcomings do not severely affect current clinical 
practice, where Monte Carlo simulation plays a role of verifi- 
cation and optimization. More demanding requirements about 
the predictive capabilities of the simulation may derive from 
new perspectives, such as the use of Monte Carlo simulation 
in treatment planning, which is the object of ongoing research 
[45]-[48], or other disciplines, like radiation protection. 

Epistemic uncertainties undermine the predictive role of 
simulation software; by identifying and quantifying them, the 
analysis described in the following sections is propaedeutic to 
further experimental and theoretical efforts to reduce them, or 
at least to control their effects. 

III. Overview of relevant physics models 

A brief overview of the Geant4 physics models relevant 
to the use case addressed in this paper is given below to 
facilitate the understanding of the results presented in the 
following sections. In the context of Geant4, particle inter- 
actions with matter are represented by processes, which can 
be implemented through different models [6], Similarities 



with the modeling approaches in other Monte Carlo codes 
are discussed; they show that the body of knowledge, as 
well as knowledge gaps, are largely shared across a variety 
of simulation systems. The information summarized in this 
section is necessarily succinct; further details can be found in 
the cited references. 



A. Electromagnetic interactions 

Geant4 electromagnetic package [49] includes two packages 
relevant to the experimental use case considered: Standard and 
Low Energy. 

The proton ionisation process in the Low Energy electro- 
magnetic package [50], [51] is articulated through models [52] 
based on the free electron gas model [53] at lower energy 
(<1 keV), on parameterisations at intermediate energy, and 
on the Bethe-Bloch equation at higher energy. Alternative pa- 
rameterisation models, identified in the following as ICRU49, 
Ziegler77, Ziegler85 and Ziegler2000 implement electronic 
and nuclear stopping powers based on [54]-[57]; their different 
energy ranges of applicability are documented in the respective 
references. The configuration of the hadron ionisation process 
is identified in the following sections through the selected pa- 
rameter! sation model. The process also deals with the emission 
of 5-rays. 

The Geant4 Low Energy electromagnetic package includes 
processes [58], [59] to handle the secondary particles resulting 
from proton interactions: electrons, photons and ions. Models 
for electrons and photons are based on data libraries (EEDL 
[60] and EPDL97 [61]) and on analytical formulations origi- 
nally developed for the Penelope [62] Monte Carlo code; both 
options account for the atomic relaxation [63] following the 
primary processes. Models based on the parameterisations in 
[55], [56], and on [64] are available for ions. 

The main features of the Geant4 Standard electromagnetic 
package are documented in [65]. More recently a model [68] 
for proton ionisation based on [54] was implemented in this 
package; this physical approach is the same as the one already 
present in the Low Energy package. The handling of energy 
loss fluctuations is implemented in this package; it is based 
on the model adopted in GEANT 3 [66] and was updated in 
Geant4 8.3 [67]. 

An assessment of Geant4 electromagnetic processes against 
the NIST (United States National Institute of Standards) 
reference data can be found in [36]. The validation of Geant4 
low energy electromagnetic processes against precision mea- 
surements of electron energy deposition is documented in [69]. 

The Standard package includes implementations [70], [71] 
of the multiple scattering process. In the early Geant4 ver- 
sions a generic process (G4MultipleScattering), based on 
the Lewis [72] theory, was applicable to any charged par- 
ticles; it has been complemented by a process special- 
ized for hadrons (G4hMultipleScattering) in Geant4 8.2, and 
one specific to electrons (G4eMultipleScattering) in Geant4 
9.3. The specialized multiple scattering processes are in- 
tended to replace the generic one in future Geant4 releases 
[73]. These processes can be configured with various mod- 
els (e.g. G4UrbanMscModel90, G4UrbanMscModel92 and 



G4UrbanMscModel93), which involve some empirical param- 
eters [74]. Early implementations of Geant4 multiple scattering 
were validated against experimental muon data at low [75], 
[76] and high [77] energy; [77] showed better accuracy of 
Geant4 multiple scattering model (as implemented in Geant4 
1.0) with respect to the GEANT 3 model based on Moliere 
theory. Recent studies, like [69], [78], have highlighted is- 
sues in the evolution of energy deposition patterns involving 
electron-photon interactions simulated with different Geant4 
versions; the observed effects have been ascribed to modifi- 
cations to Geant4 multiple scattering algorithm. The literature 
concerning recent evolutions of Geant4 multiple scattering is 
focused on issues related to the dependence of the simulation 
on transport step size and parameters of the algorithm [67], 
[68], [74]. Io the best of the authors' knowledge, the validation 
of Geant4 proton multiple scattering is not documented in 
literature yet for the energy range relevant to this study. 

The Geant4 Standard and Low Energy electromagnetic 
packages are concerned by the same epistemic uncertainties 
affecting electromagnetic physics modeling relevant to the use 
case studied in this paper, which are analyzed in sections IV-BI 
and IV-CI Differences in the outcome of simulations using 
the two packages may derive from features specific to each 
package, which are not associated with epistemic uncertainties 
in the physics domain pertinent to the use case examined in 
this paper; they could be due to numerical features, like the 
number of bins in the look-up tables used in the simulation 
and their interpolation, or algorithms and modelling choices 
specific to each package. A complete documentation and 
analysis of different features of Geant4 Standard and Low 
Energy electromagnetic packages is outside the scope of this 
paper. 

Other Monte Carlo codes used for hadron therapy sim- 
ulation adopt similar approaches for stopping power calcu- 
lation at high and low energies. At intermediate energies, 
stopping powers based on ICRU 49 Report are implemented in 
SHIELD-HIT; an improvement to include them in FLUKA is 
documented in [79], but it does not appear to be released yet in 
the current version of FLUKA (FLUKA-2008) at the time of 
writing this paper. PHITS handles proton ionization according 
to the SPAR code [80], while MCNP uses an energy weighted 
average between the high and low energy calculations [81], 
which adopt the same methodology as in SPAR. PHITS [82], 
SHIELD-HIT [42] and MCNPX do not model 6-ray emission. 

GEANT development was frozen with the 3.21 version in 
1994; the code is no longer supported, but it is still used for 
hadron therapy developments [83]. GEANT 3 simulated proton 
energy loss based on the Bethe-Bloch equation and dealt with 
£-ray production from ionization. 

MCNP multiple Coulomb scattering treats soft and hard 
interactions separately [84]: soft collisions are described using 
a continuous scattering approximation; a small number of hard 
collisions are simulated directly. Multiple scattering is based 
on Moliere theory in FLUKA and PHITS [85]; details of 
the algorithm in FLUKA are described in [86]. SHIELD-HIT 
simulates multiple scattering on the basis of a gaussian model 
[42], which gives the correlated value of the angular deviation 
and lateral displacement of the scattered particle. GEANT 3 



provided two options for multiple scattering simulation, re- 
spectively based on a Gaussian approximation and on Moliere 
theory [5]. 

B. Hadronic interactions 

The Geant4 hadronic package addresses the complexity of 
nuclear interactions through a software framework [87]. The 
baseline design can accommodate multiple implementations 
of cross sections and final state models associated with a 
process, which are either complementary in their energy range 
coverage or alternative in their modelling approach; processes 
and models are meant to be handled polymorphically through 
their respective base classes. 

1) Elastic scattering: Geant4 includes various 
elastic scattering processes: G4HadronElasticProcess, 
G4UHadronElasticProcess, G4QElastic and 

G4WHadronElasticProcess; the latter was released in 
Geant4 9.3 with the purpose [73] of allowing models for 
elastic scattering to be treated in a similar way to inelastic 
models. A G4DiffuseElastic [88] process was also released 
in Geant4 9.3; the energy range of its applicability is not 
explicitly specified in [88], but, since this process appears 
applied at energies of 1 GeV and above in the associated 
reference, one is led to assume that it is pertinent to higher 
energies than those relevant to the use case studied in this 
paper. This inference manifests an epistemic uncertainty in 
the applicability domain of this process. 

The G4HadronElasticProcess class of the hadronic pro- 
cesses package handles cross section and final state calculation 
according to the software design of [87]. 

It can be configured with the G4HadronElasticDataSet 
class, derived from G4VCrossSectionDataSet and included 
in the Geant4 hadronic cross sections package, which im- 
plements total elastic scattering cross sections derived from 
GHEISHA [89]. 

The scattering can be configured through several mod- 
els, such as G4LElastic, G4ElasticCascadeInterface and 
G4HadronElastic. G4LElastic, included in the hadronic mod- 
els package, is based on GHEISHA algorithms reengi- 
neered in Geant4; it is not meant to conserve energy 
and momentum on an event-by-event basis, but only on 
average. G4ElasticCascadeInterface, identified in the fol- 
lowing as "Bertini elastic", is included in the cascade 
package of the cascade package of hadronic models; 
it derives from G4IntraNuclearTransportModel and imple- 
ments a model based on the INUCL [90] code. The 
G4CascadeElasticInterface class in the same package activates 
both elastic and inelastic interactions. G4HadronElastic [91], 
included in the coherent Mastic package of hadronic models, 
combines elements originally developed for CHIPS (Chiral 
Invariant Phase Space) [92], [93] with other modelling ap- 
proaches; it aggregates a G4VQCrossSection object belonging 
to CHIPS and a G4ElasticHadrNucleusHE model. 

The G4QElastic process, known as "CHIPS 
elastic", delegates the calculation of cross sections 
to the G4QElasticCrossSection class, derived from 
G4VQCrossSection, and implements its own scattering 



algorithm. All the related classes are included in the 
ch.iralj.nv .phase space package of hadronic models. 

The G4UHadronElasticProcess process [91], included in 
the coherent elastic package of hadronic models, is meant to 
be configured with the dedicated G4HadronElastic model; this 
configuration is referred to in the following as "U-elastic". 

The limited documentation in the literature of Geant4 elastic 
scattering models and other codes does not facilitate the 
appreciation of their characteristics, nor the identification of 
the experimental data with respect to which some of the 
simulation models, especially those implementing parameter- 
isations, may have been calibrated. Although improvements 
to Geant4 elastic scattering modeling are mentioned in [40], 
hardly any validation of the available models is documented 
in the literature in the energy range relevant to the use case 
under study. The use of these models in the simulation is a 
source of epistemic uncertainty, due to the lack of knowledge 
of their accuracy in the energy range pertinent to this study. 

2) Non-elastic interactions: Inelastic hadron-nucleus scat- 
tering is handled in Geant4 through processes specific to each 
particle type. Processes for protons, neutrons, deuteron, triton 
and a particles are relevant to this study. 

Total inelastic cross sections derived from GHEISHA [89] 
are available in Geant4 through G4HadronInelasticDataSet 
for all hadrons, a particles, deuteron and triton; alternative 
implementations based on [94], [95] are available for some 
energy ranges and target materials. Cross sections describ- 
ing neutron-nucleus scattering with higher precision below 
20 MeV are available in G4NeutronHPInelasticData. Special- 
ized cross sections, based on [96]- [101], are available for 
ions. 

Parameterised and theory-driven [102] models of nuclear 
inelastic scattering are available in Geant4 for protons and 
other particles, concerning the energy range pertinent to this 
study. 

Geant4 Low Energy Parameterised models (LEP), originat- 
ing from GHEISHA [89], handle protons, neutrons, pions, a 
particles, deuterons and tritons. 

The CHIPS G4QInelastic inelastic process [92], [93] imple- 
mented in Geant4 is applicable to hadron inelastic scattering 
in the energy range pertinent to this study. 

Various options of theory-driven models describe the phases 
of intranuclear transport, preequilibrium and nuclear deexci- 
tation in Geant4. Other Monte Carlo codes used in proton 
therapy applications use a similar multi-stage approach to 
simulate proton inelastic interactions. The primary proton 
energy in this study lies at the edge of what is commonly 
considered as the range of transition between intranuclear 
cascade and preequilibrium models. 

Geant4 includes three cascade models, each one with further 
associated models describing the lower energy phases: they are 
known as Binary [103], Bertini [104], [105] and Liege [106] 
cascade models. 

The Binary cascade model [103] adopts a hybrid approach 
between a classical cascade code and a quantum molecular dy- 
namics model. It handles the preequilibrium phase through the 
Precompound model (G4PreCompoundModel) [107], whose 
implementation is based on Griffin's exciton model [108], 



[109]; this model can be activated in a simulation application 
either through the Binary cascade model or as an independent 
model. The nuclear deexcitation associated with the Precom- 
pound model can be configured with various options: they 
include an evaporation model based on Weisskopf-Ewing's 
[110], [111] theory, exploiting Dostrovsky's computational 
approach [112], the Generalized Evaporation Model (GEM) 
[113] (also used by PHITS), and the optional activation of the 
Fermi break-up [107]. 

A similar approach is adopted in SHIELD-HIT, whose 
default model considers a fast cascade stage, which brings 
the interaction between the projectile and target to a sequence 
of binary collisions [114]; this stage is followed by preequi- 
librium [115] and deexcitation of residual nuclei, with Fermi 
break up of light nuclei and evaporation. 

The Geant4 Bertini Cascade implementation [104], [105] is 
a reengineering of the INUCL code [90], which is based on 
Bertini's approach [116] to intranuclear transport; it handles 
the preequilibrium phase based on Griffin's exciton model and 
the evaporation phase based on Weisskopf's and Dostrovsky's 
approach. The preequilibrium part of INUCL is based on 
the Cascade Exciton Model (CEM) [117], which is one of 
the options of MCNPX for proton transport and is also 
implemented in SHIELD (as well as in other codes) [118]. 

A cascade model based on Bertini's scheme, derived from 
the HETC [119] implementation in LAHET [120], is available 
in MCNPX to handle protons, besides the ISABEL [121], 
[122] and CEM options. The HETC model was also interfaced 
to GEANT 3 through CALOR [123]. 

The INCL Intranuclear Cascade [106] model, better known 
as Liege Cascade, has been reengineered in Geant4 together 
with the associated ABLA evaporation model [124]; it was first 
released in Geant4 version 9.1 [125] and further improved in 
Geant4 9.3. INCL is included [126] in the LAHET [120] code 
system used by MCNPX. Although, according to [125], INCL 
is meant for energies above 200 MeV, satisfactory applications 
at energies of a few tens MeV are reported in the literature 
[127]-[129]. 

The simulation of low energy proton interactions in PHITS 
is based [130] on the MCNP4C and NMTC [131] codes; the 
latter incorporates Bertini's cascade model [116] for nucleon 
and meson transport. 

FLUKA handles inelastic scattering through PEANUT 
[132] in the energy range relevant to the use case under study; 
it involves a sequence of intranuclear cascade followed by 
preequilibrium and deexcitation. The preequilibrium model 
is based on the formalism developed by Blann [133] with 
some modifications [134]; the evaporation model is based 
on Weisskopf-Ewing's approach [135]; Fermi break-up is 
modeled for light nuclei [135]. 

Hadronic interactions were not handled by GEANT 3; their 
treatment was delegated to external packages (GHEISHA, 
CALOR and an early version of FLUKA) interfaced to it. 

Limited documentation regarding the validation of Geant4 
inelastic scattering models relevant to the use case of this 
study is available in the literature. Some comparisons with 
experimental data are reported in [38], [39], [40], [91], [103], 
[136], [137]: although most of the results shown in these 



references are not directly related to the use case investigated 
in this paper, they demonstrate the ongoing validation efforts 
in this domain. 

C. Epistemic uncertainties in the simulation models 

Epistemic uncertainties in the physics simulation models 
arise from various sources. 

In some cases, the lack of knowledge concerns the value 
of a physical parameter: this is the case, for instance, for 
the mean water ionization potential, for which various values, 
originating from experimental measurements or theoretical 
calculations, are documented in the literature. 

Other sources of uncertainty are associated with values, used 
in the simulation, deriving from parameterisations, or fits to 
experimental data (which may be inspired by theoretical mo- 
tivations): in the present study this concerns, for instance, the 
cross sections of nuclear elastic and inelastic interactions, and 
proton stopping powers. In these cases the uncertainties derive 
from the measurements themselves, the criteria by which the 
data are selected for the fit, and from the parameterisation or 
fitting process. 

Some models may embed parameters or, more generally, 
features, which are adjusted in the software implementation 
according to empirical procedures: from calibration with re- 
spect to experimental data to educated guesses, in the absence 
of pertinent measurements. This is the case, for instance, for 
the Geant4 multiple scattering model and forsome hadronic 
interaction models. In this respect, and also for models based 
on parameterisations or fits to experimental data, an important 
issue is the distinction between the calibration and validation 
of simulation models; for the reader's convenience, these con- 
cepts pertaining to simulation epistemology are briefly recalled 
here. Calibration is the process of improving the agreement of 
a code calculation with respect to a chosen set of benchmarks 
through the adjustment of parameters implemented in the code 
[19]; in the Monte Carlo simulation jargon this process is also 
known as "tuning". Software validation is defined in the IEEE 
Standard for Software Verification and Validation [138]. This 
generic definition is adapted to specific application domains 
with some slight variants; regarding simulation, validation is 
usually intended as the process of determining the degree to 
which a model is an accurate representation of the real world 
from the perspective of its intended uses, or of confirming 
that the predictions of a code adequately represent measured 
physical phenomena [19], [139]. 

The limited documentation in the literature of the calibration 
of the physics models implemented in Monte Carlo codes 
does not facilitate the understanding whether some of the 
comparisons with experimental data reported in the literature 
document calibration results and their experimental bench- 
marks, or model validation. 

The use of a simulation code for predictive purposes outside 
the scope of its validation necessitates extrapolation beyond 
the understanding gained strictly from experimental validation 
data. This type of uncertainty in our inference is primarily 
epistemic. 

Regarding the energy range of nuclear interactions relevant 
to the use case considered in this paper, a long debate has 



TABLE I 
Proton physics modelling options in the simulation 



Physics domain 


Option 


Process class 


Model class 


Proton 

stopping 

powers 


ICRU49 
Ziegler77 
Ziegler85 
Ziegler2000 


G4hLowEnergyIonisation 
G4hLowEnergyIonisation 
G4hLowEnergyIonisation 
G4hLowEnergyIonisation 


G4hICRU49p 
G4hZieglerl977p 
G4hZieglerl985p 
G4hSRIM2000p 


Multiple scattering 


Generic multiple scattering 
Hadron multiple scattering 


G4MultipleScattering 
G4hMultipleScattering 


G4UrbanMscModel92 
G4UrbanMscModel90 


Hadronic 

elastic 

scattering 


LEP 

U-elastic 

Bertini-elastic 

CHIPS-elastic 


G4HadronElasticProcess 
G4UHadronElasticProcess 
G4HadronElasticProcess 
G4QElastic 


G4LElastic 

G4HadronElastic 

G4ElasticCascadeInterface 


Hadronic inelastic 
cross sections 


Default 
Wellisch [94] 


G4ProtonInelasticProcess 
G4ProtonInelasticProcess 


G4HadronInelasticDataSet 
G4ProtonInelasticCrossSection 


Hadronic 

inelastic 

scattering 


LEP 

Precompound 

Precompound-GEM 

Precompound-Fermi break-up 

Binary cascade 

Bertini cascade 

Liege 

CHIPS-inelastic 


G4ProtonInelasticProcess 
G4ProtonInelasticProcess 
G4ProtonInelasticProcess 
G4ProtonInelasticProcess 
G4ProtonInelasticProcess 
G4ProtonInelasticProcess 
G4ProtonInelasticProcess 
G4QInelastic 


G4LEProtonInelastic 

G4PreCompoundModel 

G4PreCompoundModel, G4ExcitationHandler 

G4PreCompoundModel, G4ExcitationHandler 

G4BinaryCascade 

G4CascadeInterface 

G4InclAblaCascadeInterface 



been going on for decades in the literature about different 
theoretical preequilibrium models, respectively based on the 
so-called "exciton" and "hybrid" approaches [140]. These 
discussions involve subtle theoretical arguments; however, it 
has been acknowledged that, whichever theoretical approach 
is chosen to model equilibrium emission, the effects of overly 
simple or untested assumptions can be compensated by means 
of other uncorrelated phenomenological parameterisation of 
the model. This ongoing theoretical debate is not expected to 
be relevant to the use case addressed in this study, since the 
differences of the various theoretical approaches are expected 
to affect mainly exclusive channels [141], with negligible 
effects on the resulting deposited energy, which is the object 
of this paper. The sources of epistemic uncertainties in the 
simulation reside in the phenomenological features of the 
nuclear model implementations, rather than in the choice of 
theoretical approach to preequilibrium modeling. 

The analysis described in the following sections identifies 
sources of epistemic uncertainties in the physics domain of the 
simulation and evaluates systematic effects on the simulation 
outcome associated with them. 

IV. Software configuration 
A. Simulation 

The simulation application includes components responsible 
for the configuration of the geometry and materials of the 
experimental set-up, the generation of the proton beam, the 
selection of the physics features to be used in the particle 
transport, the collection of relevant observables in dedicated 
objects, and the control of the user's interaction with Geant4 
kernel at various stages of the execution. 

The geometry configuration encompasses a realistic model 
of a proton therapy beam line and a volume, placed at the 
exit of it, where the energy deposited by the proton beam is 
scored. The beam line model exploits the code of the geometry 
and material composition of a real-life proton therapy facility 
[18], which is publicly released in the Geant4 hadrontherapy 



[30] example; the implementation of the beam line geometry 
as in Geant4 8.1 was used for all the simulation productions. 
The scoring volume consists of 4 cm cube, filled with water; 
its size is adequate to contain the formation of the Bragg 
peak of energy loss produced by the proton beam. This 
volume is defined "sensitive" [6] in Geant4 terms. A readout 
geometry, longitudinally segmented in 200 [im thick slices, is 
superimposed to the mass geometry of the sensitive volume; 
the longitudinal segmentation determines the resolution of the 
simulation in the location of the Bragg peak, and mimics the 
resolution of typical measurements with ionization chambers 
in experimental practice. The energy deposit profile is scored 
through Geant4 hits objects. The figures of longitudinal energy 
deposition profiles included in the following sections show 
the energy deposited in each slice of the readout geometry by 
primary protons and secondary particles, integrated over all 
the events in the simulation run. 

Primary protons are generated according to a Gaussian dis- 
tribution with 63.95 MeV mean energy and 300 keV standard 
deviation, unless differently specified in the following sections. 

A variety of physics options is configured by means of 
a software design exploiting Geant4 G4VModularPhysicsList 
and G4VPhysicsConstructor as base classes. A class derived 
from Geant4 kernel's G4VModularPhysicsList is responsible 
for driving the physics selection in the simulation application: 
it assembles a physics configuration by means of subclasses of 
G4VPhysicsConstructor, each one responsible for instantiating 
the physics processes and models pertinent to a particle type 
(or a group of particles, like ions) and an interaction type 
(electromagnetic, hadronic elastic or hadronic inelastic). 

The proton physics options and corresponding Geant4 
classes evaluated in this study are summarized in Table U 

For convenience, a subset of G4VPhysicsConstructor sub- 
classes, corresponding to the selection in Table [II] has been 
defined as a reference configuration for this study. 

In all the productions the interactions of secondary particles 
are simulated as in Table HI] unless differently specified in the 
following sections. 



TABLE II 
Reference physics configuration in the simulation 



Particles 


Physics selection 


Electrons 
Photons 


Low energy electromagnetic package, 
library-based models 


Positrons 


Standard electromagnetic package 


Protons 


G4hLowEnergyIonization, ICRU49 parameterisations 

G4UHadronElastic process, G4HadronElastic model 

G4ProtonInelasticProcess, Precompound model 

Inelastic cross sections as in [94] 


Neutrons 


G4UHadronElastic process, G4HadnmEkistic model 

G4NeutronInelasticProcess, Precompound model 

G4HadronCaptureProcess, LEP model 

G4HadronFissionProcess, LEP model 

Inelastic cross sections as in [94] 


Deuteron 

Triton 

a 


G4hLowEnerjr\Ionization, ICRU49 parameterisations (scaled) 

G4UHadwnElastic process, G4HadronElastic model 

Inelastic process specific to each particle, LEP models 

Tripathi and Shen cross sections 


Charged 


G4MultipleScattering process 



Options regarding the water ionization potential can be 
introduced in the simulation by setting the value of this 
parameter through the public interface of the G4Material class. 

The secondary particle production threshold is set at 1 /im 
(in range) in the water volume and 10 fim in the passive 
components of the experimental set-up. The results presented 
in the following sections are generated with 10 /im maximum 
step size; the step limit is set approximately an order of 
magnitude smaller than the thickness of the slices in the 
readout geometry to ensure adequate precision in scoring the 
longitudinal energy deposition profile. 

B. Simulation production 

Simulated data were produced by instantiating various com- 
binations of physics constructor objects in the simulation 
application corresponding to the options listed in Table U 

The simulation production concerned a set of representative 
physics configurations, each differing from the reference one 
of Table Ullbv one modeling feature only; this strategy allowed 
the unambigous attribution of the differences observed in each 
simulation to the physics feature specifically under investi- 
gation. One million events were generated for each physics 
configuration. 

Data samples corresponding to all the options listed in Table 
U were produced with Geant4 9.3, the latest version available 
at the time of writing this paper. Data samples corresponding 
to a subset of physics configurations were also produced with 
three other Geant4 versions: 8.1p02, 9.1 and 9.2p03. Versions 
8.1p02 and 9.1 were involved in the validation of electron 
energy deposition in [69]; Geant4 8.1 was previously used 
for the assessement of some Geant4 physics models in [35] 
(subject to the same beam settings adjustments as in [27]), 
while Geant4 9.1 is still widely used in production mode by 
various experiments. Geant4 9.2p03 is an updated version of 
Geant4 9.2 including corrections; it was released two months 
later than Geant4 9.3. 

The simulations were run on Intel Core2 Duo Processors 
E6420, equipped with 2.13 GHz CPU and 4 GB RAM, 
Scientific Linux 4 operating system and gcc 3.4.6. compiler. 



C. Data analysis 

Data analysis objects were handled in the simulation appli- 
cation code through AIDA [142] abstract interfaces; the iAIDA 
[143] implementation was used. 

The results of the various simulation configurations were 
compared to the outcome of the reference one. No normaliza- 
tion was performed on the Bragg peak profiles to be compared 
(unless explicity stated in the following sections): therefore 
the distributions, which originate from the same number of 
primary protons in all the simulation configurations, allow the 
appreciation not only of differences in shape and peak location, 
but also of absolute effects, like the proton acceptance in the 
sensitive volume and the total energy deposited in it. 

Within a given Geant4 version, any observed difference 
in the results is to be ascribed to intrinsic effects of the 
physics models activated in the simulation. To the best of 
the authors' knowledge, the Geant4 transport kernel did not 
modify its behavior through the versions considered in this 
study; differences observed in comparisons across different 
Geant4 versions reflect the evolution of the physics model 
implementations . 

The differences associated with the various simulation con- 
figurations were quantitatively estimated by means of statis- 
tical methods; the comparisons exploited non-parametric, un- 
binned goodness-of-fit tests available in the Statistical Toolkit 
[144], [145]. Different algorithms were applied to each test 
case to evaluate possible systematic effects on the results 
due to particular features of the tests: the Anderson-Darling 
[146], [147], Cramer-von Mises [148]-[150] and Kolmogorov- 
Smirnov [151], [152] tests. The null hypothesis in the test 
process assumed the equivalence between the distributions 
subject to comparison. The critical value for the rejection of 
the null hypothesis was set at 0.05, unless differently specified; 
in the following sections the expression "95% confidence 
level" is used to indicate 0.05 significance of the test. 

Goodness-of-fit tests compared the whole longitudinal dis- 
tributions of energy deposition, as well as the distributions 
corresponding to the left and right branches of the longitudinal 
profiles, i.e. at penetration depths respectively up to and 
beyond the peak position, to ascertain the compatibility of 
the data samples in detail. 

The comparison results mentioned in the following sections 
concern the left branch of the energy deposition profiles, unless 
differently specified. Due to the mathematical features of the 
test statistics and empirical distribution functions, the left 
branch of the profiles provides the most sensitive test case to 
highlight differences in the distributions subject to comparison. 

V. Results 

The following sections summarize the main results of the 
analysis of the various Geant4 physics modelling options; they 
concern the longitudinal energy deposition profile. 

The lateral energy deposition pattern is also of interest to 
proton therapy, and to other experimental applications as well; 
related major sources of epistemic uncertainties in the simula- 
tion are the models of multiple scattering and nuclear elastic 
scattering, and secondary particle production from nuclear 




24 26 

Depth (mm) 




15 20 

Depth (mm) 



Fig. I. Bragg peak profile resulting from electromagnetic interactions 
only (blue dotted curve), electromagnetic interactions and hadronic elastic 
scattering (solid red curve), electromagnetic, hadronic elastic and inelastic 
scattering (dashed black curve). The profiles were produced with the Geant4 
9.3 physics configuration listed in Table ITTl and subsets of it. The Bragg peak 
is from one million primary protons, generated with (E) = 63.95 MeV 
mean energy and a = 300 keV standard deviation incident on water; the 
plot shows the deposited energy in each slice of the longitudinally segmented 
readout geometry, integrated over all the generated events. 



Fig. 2. Longitudinal energy deposition due to protons (dark shaded area) 
and electrons (light shaded area); the thick line represents the total energy 
deposited by all particle species. The profiles were produced with the Geant4 
9.3 physics configuration listed in Table ITTl the electron production threshold 
was equivalent to 250 eV in water. The Bragg peak is from one million 
primary protons with (E) = 63.95 MeV and a = 300 keV incident on 
water; the plot shows the deposited energy in each slice of the longitudinally 
segmented readout geometry, integrated over all the generated events. 



interactions. The data samples produced for the study of the 
longitudinal energy deposition profile are of inadequate size 
to draw any statistically significant conclusions concerning the 
effects of epistemic uncertainties on the lateral energy distri- 
bution patterns; their quantification at comparable significance 
level would require simulation samples at least two orders of 
magnitude larger for the analysis of the lateral distribution of 
energy deposition than for the longitudinal one. Such a large 
scale simulation production in a realistic experimental use case 
was beyond the practical reach of the limited computational 
resources available to the authors; it should be pursued once 
adequate computing means are available. 

Unless differently specified in the text, the results were 
produced with Geant4 version 9.3. 



A. Shaping the longitudinal energy deposition 

Electromagnetic, hadronic elastic and inelastic interactions 
contribute to a different extent to shaping the energy deposition 
as a function of penetration depth; hadronic interactions are 
known [28], [29] to be relevant to the simulation of the 
proton Bragg peak. The relative contribution was evaluated by 
activating partial and full components of the reference physics 
configuration in the simulation: electromagnetic interactions 
only, electromagnetic interactions and hadronic elastic scatter- 
ing, and the full set, including hadronic inelastic processes, as 
in Table [TT] Fig. Q] shows the longitudinal energy deposition 
resulting from the various contributions: the peak depth and the 
overall pattern of energy deposit are dominated by the elec- 
tromagnetic component, while elastic and inelastic hadronic 
interactions contribute to modify its shape. Other options of 
Geant4 electromagnetic and hadronic models result in similar 
apportioning among the various physics contributions. 



The energy deposited in the sensitive volume derives from 
protons, electrons and other particles; the contribution from 
the latter amounts to less than 1%. 

The relevance of electrons' contribution to the deposited 
energy is related to the electron production threshold set in the 
simulation application; the results described in this paper were 
produced with a threshold equivalent to 250 eV in the sensitive 
water volume, that is the lowest energy recommended for 
use with Geant4 library-based electron and photon processes, 
and corresponds to the setting in the validation study of 
[69]. The resulting contribution of secondary electrons to the 
longitudinal energy deposit profile is illustrated in Fig. |2 

The accuracy of the secondary electron simulation con- 
tributes to the overall accuracy of the energy deposition 
deriving from primary protons; epistemic uncertainties in the 
electron simulation models may affect the results. For Geant4, 
the validation of the longitudinal energy deposition of elec- 
trons in an energy range relevant to this study is documented 
in [69]. 

Low-energy electrons are of great importance in therapeu- 
tical particle beams, since they are very powerful in causing 
lethal damage to the cells [153]; the accuracy of <5-ray simula- 
tion is especially important for studies of the biological effects 
of proton irradiation. 

The passage of primary particles through the beam line 
affects the acceptance, i.e. the fraction of primary particles 
reaching the sensitive volume, and the characteristics of the 
particles entering it. 

The spectrum of protons at the entrance of the sensitive 
volume, after traversing the beam line, is shown in Fig. [3] 
The proton interactions in the beam line shift the mean of the 
energy distribution to a lower value than the nominal beam 
energy of 63.95 MeV and broaden its width, originally of 300 
keV: the peak part of the spectrum in Fig. [3] is well fit (with 
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Fig. 3. Proton energy spectrum at the entrance of the sensitive water volume, 
after traversing the beam line; the primary beam was modelled according to 
a gaussian distribution, with (E) = 63.95 MeV and a = 300 keV. 



p-value 1) by a gaussian distribution with 59.823 MeV mean 
and 376 keV width (standard deviation). The energy spectrum 
of the protons entering the sensitive volume is characterized 
by an extended low energy tail, which affects the plateau of 
the energy deposition profile in the water volume: low energy 
protons stop at lower depth in the water volume, producing a 
localized large energy deposit typical of the Bragg peak. This 
effect is highlighted in Fig. |4] which compares the energy 
deposition profile resulting from the proton spectrum shaped 
by the beam line to the one produced by the same spectrum, 
where the tail was suppressed. To suppress the tail, primary 
protons at the entrance of the sensitive volume, whose energy 
differed by more than three standard deviations from the 
59.823 MeV peak value, were not tracked further. A data 
sample consisting of 280000 primary protons entering the 
sensitive volume was used for producing the energy deposition 
profile in Fig. HI this sample is larger than the ones shown in 
other figures to better expose the effects of the low energy 
proton tail. 

In the plateau at lower penetration depth the difference 
between the two curves amounts to more than 15%, while 
the shape of the peak is hardly affected. This feature affects 
parameters used in clinical practice to evaluate the quality of 
the irradiation, like the peak to entrance ratio. This analysis 
shows that imprecise knowledge of the beam line geometry 
and materials can affect the energy deposited in the sensitive 
volume; for accurate simulation of the energy deposition in the 
sensitive water volume, not only accurate modelling of particle 
interactions in water is important, but also in the materials of 
the beam transport line. 

In experimental practice, the features of the particle spec- 
trum should be taken into account in the choice of the optimal 
technique for the validation of simulation models: for instance, 
Faraday-cup based dosimetry is more sensitive to the energy 
distribution of the proton beam than ionization chambers or 
calorimeters [154]; the presence of a small admixture of 
low-energy scattered protons can lead to significant errors in 
absorbed dose determination with Faraday cup. [154]- [156]. 



Fig. 4. Energy deposition versus penetration depth resulting from the whole 
spectrum of particles entering the water phantom (solid line) and produced 
only by the peak portion of the spectrum of Fig. [5] (dashed line), excluding 
the low energy tail. The peak portion of the primary proton spectrum at the 
entrance has 59.823 MeV mean energy and 376 keV standard deviation; 
the figure shows the deposited energy in each slice of the longitudinally 
segmented readout geometry, integrated over 280000 primary protons entering 
the sensitive water volume. 



The acceptance values calculated with the various physics 
model combinations listed in Table U are all compatible within 
the statistical uncertainties. Further details about the effects of 
physics models, and their associated epistemic uncertainties, 
on the determination of the acceptance in the sensitive volume 
are examined in section [V^Gl 

B. Water mean ionisation potential 

Knowledge of the mean excitation energy of a medium 
is needed to calculate the energy loss of a charged particle 
penetrating that medium; various theoretical calculations and 
experimental measurements are documented in the literature 
concerning this parameter. The value (75±3 eV) recommended 
in ICRU Report 49 [54] is commonly used in Monte Carlo 
codes (e.g. Geant4, FLUKA, MCNPX). Nevertheless, values 
differing from this reference have recently been proposed: 
among them, 80. 8 ±3 eV in [157], based on theoretical and 
experimental considerations, 61.77 eV in [158], 79.7±0.5 eV 
in [159] and 81.6 eV in [160]; a lower value of 67.2 eV 
is assumed in ICRU Report 73 [64], [157]. An experimental 
determination of 78.4±1.0 eV was reported in [165], where a 
Geant4 simulation encompassing the ICRU49 stopping power 
model was utilized. The lack of consensus about the value of 
this parameter corresponds to an epistemic uncertainty in the 
simulation. 

The effect of the uncertainty of the water ionization potential 
was estimated by performing simulations with values of 75 
eV (as in the reference physics configuration), 67.2 eV and 
80.8 eV; apart from this feature, the application activated the 
physics configuration summarized in Table HI1 The longitudinal 
energy deposition profiles corresponding to different values 
are shown in Fig. |5] A small shift in the depth of the peak 
is visible; the 67.2 eV and 80.8 eV settings displace the peak 
to the adjacent readout geometry slices with respect to the 
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Fig. 5. Bragg peak profile resulting from different water ionisation potentials 
and proton beam energies: 75 eV [54] (solid black curve), 67.2 eV (dotted blue 
curve), 80.8 eV (dashed red curve) with proton beam energy of 63.95 MeV, 
and 80.8 eV (dot-dashed green curve) with proton beam energy of 63.65 MeV. 
The Bragg peak is from one million primary protons with (E) = 63.95 MeV 
and cr = 300 keV incident on water; the plot shows the deposited energy in 
each slice of the longitudinally segmented readout geometry, integrated over 
all the generated events. 



depth resulting from the water ionization potential set at 75 eV. 
As described in section IIV-AI the resolution in the Bragg 
peak location achievable in the simulation is 200 fxm, which 
corresponds to the longitudinal segmentation of the readout 
geometry. 

Similar effects were also observed in simulations with the 
SHIELD-HIT code [161] and with FLUKA [162]; [35] reports 
approximately 1 mm shift between Geant4-based Bragg peak 
simulations of 85.6 and 209.2 MeV proton beams, respectively 
with 75 eV and 70.9 eV water mean ionization potential 
(however, without specifying the longitudinal resolution of the 
deposited energy collection). 

In experimental practice, the ionization potential is usually 
treated as a free parameter in the simulation, which is adjusted 
to improve the match between experimental and simulated 
data (e.g. [163], [164], [166], [167]). It is worth noting that 
different optimal values of this parameter were identified in 
the literature to best match the respective experimental data. 

The experimental environment typical of therapeutical beam 
lines provides limited insight into this simulation feature, 
due to the common practice of empirically determining the 
optimal beam parameters based on the comparison between 
simulated and observed depth dose profiles, as discussed in 
section U A test was performed to investigate this issue: two 
simulations were executed with different, but equally plausible 
settings - respectively with 63.95 proton beam energy and 75 
eV ionization potential, and with 63.65 MeV proton beam 
energy and 80.8 eV ionization potential; the beam energy shift 
is compatible with typical uncertainties of the experimental 
environment under study. The resulting longitudinal energy 
deposition profiles, shown in Fig. |5J are practically undistin- 
guishable: the peak positions coincide, and the goodness of 
fit tests mentioned in section IIV-CI confirm their compatibility 
at 90% confidence level. The comparison with experimental 



data, whose beam energy is not known a priori with adequate 
precision, would not be capable of discriminating the accuracy 
of such distributions. 

The systematic effects highlighted by this analysis are rele- 
vant only when the simulation is expected to play a predictive 
role. In common applications, where the simulation is used 
only for verification purpose, the empirical adjudstments of 
the water mean ionization potential and of the proton beam 
parameters mask any potential systematic effects. The exper- 
imental discrimination of the simulation accuracy resulting 
from different water ionization potential values would require 
precise knowledge of the beam parameters and accurate mea- 
surements of the energy deposition as a function of depth, 
such that displacements of the Bragg peak smaller than 200 
yitm, associated with the value of this parameter, could be 
appreciated. 

C. Proton stopping powers 

Compilations of proton stopping powers are available in 
[54], [55], [56], and in the SRIM [168] code. Despite the 
wide body of experimental data, theoretical calculations and 
empirical models of proton stopping powers, no consensus has 
yet been achieved on definitely established values. Evaluations 
of empirical and theoretical stopping power models reported 
in the literature [169], [170] are limited to a few elements 
and compounds; they highlight differences among the various 
compilations. According to these analyses, more recent stop- 
ping power models do not necessarily correspond to improved 
accuracy; some models describe the stopping powers for some 
materials well, but appear less accurate for other materials. 

Due to this controversial situation, proton stopping powers 
are a source of epistemic uncertainty in the simulation results. 
A study was performed to evaluate the effects on the Bragg 
peak profile related to different stopping power models imple- 
mented in Geant4. 

The simulations were performed with physics settings as in 
Table [Til apart from configuring the low energy hadron ion- 
ization process with various stopping power parameterisation 
models. The energy range of application was set according 
to the recommendations of the respective references for the 
ICRU49, Ziegler77 and Ziegler85 parameterisation models; 
lacking specific documentation about the applicability of the 
Ziegler2000 parameterisation, this model was applied up to 
2 MeV, as for ICRU49. 

The results are illustrated in Fig. [6j the various proton 
stopping power models produce slightly different energy de- 
position profiles; the Ziegler77 and Ziegler85 models produce 
almost identical results. The peaks associated with alterna- 
tive proton stopping power models are located in adjacent 
longitudinal readout geometry slices with respect to the peak 
produced by the reference configuration of Table [TTJ, including 
ICRU49 stopping powers; as described in section IIV-AI the 
longitudinal readout segmentation is 200 /im. Apart from the 
shift in the peak position, the shapes of the energy deposition 
profiles are statistically compatible at 90% confidence level 
according to all the goodness of fit tests mentioned in section 
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Fig. 6. Energy deposition as a function of depth resulting from different 
proton stopping power models: ICRU49 (solid black curve), Ziegler77 (dash- 
dotted green curve) Ziegler85 (dashed red curve) and Ziegler2000 (dotted 
blue curve); the profiles correspondign to the Ziegler77 and Ziegler85 models 
are barely distinguishable in the plot, due to the great similarity of their 
simulation results. The Bragg peak is from one million primary protons with 
(E) = 63.95 MeV and a = 300 keV incident on water; the plot shows 
the deposited energy in each slice of the longitudinally segmented readout 
geometry, integrated over all the generated events. 



Fig. 7. Energy deposition profiles associated with various proton elastic 
scattering options: "U-elastic" (solid black curve), Bertini (blue dashed curve), 
LEP (green dot-dashed curve) and CHIPS (red dotted curve). The Bragg peaks 
are from one million primary protons with (E) = 63.95 MeV and a = 300 
keV incident on water; the plot shows the deposited energy in each slice of the 
longitudinally segmented readout geometry, integrated over all the generated 
events. 



constrain any simulation parameters. 



Similarly to the discussion in the previous section con- 
cerning the water mean ionization potential, the epistemic 
uncertainty related to the stopping power model used in 
the simulation turns into systematic effects only when the 
simulation is required to play a predictive role; otherwise, 
as shown in the previous case, a small adjustment of the 
proton beam energy, compatible with typical experimental 
uncertainties, would shift the energy deposition profiles de- 
riving from different stopping power models into statistically 
equivalent distributions. These considerations suggest that 
typical proton therapy experimental environments would not 
be sensitive to the differences of stopping power models, nor 
would they be adequate to discriminate their accuracy. All the 
available stopping power models appear equally suitable to 
that simulation environment; in this respect, one can observe 
that the use of different Geant4 models has been reported 
with satisfactory agreement against experimental data, despite 
the fact that they determine different Bragg peak depths: for 
instance, Ziegler2000 in [31] and ICRU49 in [27], [30], [33]. 

If predictive capabilities are required from the simulation, 
higher precision experimental measurements would be neces- 
sary to discriminate the accuracy of the existing models with 
the capability of appreciating shifts in the peak depth smaller 
than 200 [im. 

This context should be taken into account when considering 
comparative evaluations of the accuracy of simulation models: 
the procedure of empirically adjusting the parameters in the 
simulation, based on a selected physics configuration, to best 
fit the experimental data is prone to bias further comparisons 
of other simulation models with the same data. The estimate 
of the relative accuracy of alternative physics models would 
require the capability of comparing the simulation outcome 
to measurements, without privileging any of the models to 



D. Hadronic elastic scattering 

Four elastic scattering modeling options available in Geant4 
were compared: the G4UHadronElasticProcess with the 
G4HadronElastic model, the G4HadronElasticProcess pro- 
cess with the G4LElastic (LEP) or G4ElasticCascadeInterface 
(Bertini elastic) models, and the CHIPS G4QElastic process. 

The simulation application activated the physics configura- 
tion as in Table [TT] for all other features apart from proton 
elastic scattering. The longitudinal energy deposition profiles 
resulting from the various simulation configurations are shown 
in Fig. [7] The distribution of the relative difference of the 
energy deposited in each longitudinal slice of the sensitive 
volume with respect to the outcome from the reference con- 
figuration of Table iHlis shown in Fig.|8]for the various options; 
the differences are mostly comprised within ±2%. 

The compatibility of the energy deposition profiles associ- 
ated with the various elastic scattering options is confirmed by 
goodness-of-fit tests, whose results are reported in Table [III] 
All the tests fail to reject the null hypothesis of compatibility 
with the profile deriving from the reference configuration, 
with 0.1 significance. A subset of elastic scattering modeling 
options, limited to the "U-elastic", "Bertini" and "LEP" ones, 
was compared in the context of Geant4 8.1p02 and 9.1 
versions as well. All the considered elastic scattering options 
were compatible with 0.1 significance within a given Geant4 
version; the results concerning the comparison of the left 
branch of the energy deposition profiles are reported in Table 
EH 

If the differences between the outcome of two simulation 
configurations were due only to statistical fluctuations, one 
would expect them to be distributed, as a function of depth, in 
a rather large number of short sequences (runs) of consecutive 
positive and negative values; the Wald-Wolfowitz test [171] 
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TABLE III 

P- VALUE OF GOODNESS-OF-FIT TESTS COMPARING HADRONIC ELASTIC 

SCATTERING OPTIONS 



Version 


Range 


Model 


Kolmogorov 
Smirnov 


Anderson 
Darling 


Cramer 
von Mises 


9.3 


Whole 


Bertini 

LEP 

CHIPS 


1 

1 

0.997 


1 

1 

0.997 


1 

1 

0.999 


Left 
branch 


Bertini 

LEP 

CHIPS 


1 

1 

0.996 


1 

1 

0.982 


1 

1 

0.999 


Right 
branch 


Bertini 

LEP 

CHIPS 


1 
1 
1 


0.986 
0.986 
0.986 


0.972 
0.972 
0.972 


9.1 


Left 
branch 


Bertini 
LEP 


1 
0.996 


1 
0.989 


1 
1 


8.1 


Left 
branch 


Bertini 
LEP 


1 
1 


0.999 

1 


0.997 

1 



Fig. 8. Relative difference of the energy deposited in the longitudinal 
slices of the sensitive water volume for various elastic scattering configu- 
rations, with respect to the results of the configuration as in Table [n] with 
G4UHadronElasticProcess and the G4HadronElastic model: Bertini (solid 
black histogram), LEP (blue dashed histogram) and CHIPS (red dotted 
histogram) elastic scattering. The relative difference is calculated in each slice 
of the longitudinal segmentation of the readout geometry associated with the 
sensitive volume. The energy deposition derives from one million primary 
protons with (E) = 63.95 MeV and a = 300 keV incident on water. 



was applied to evaluate this hypothesis. The resulting p-value 
is smaller than 0.001 for all the test cases; therefore one 
can infer some systematic effects associated with the choice 
of the elastic scattering model in the simulation. It is worth 
remarking that the conclusion drawn from the Wald-Wolfowitz 
test does not contradict the result of the goodness-of-fit tests: 
the two types of tests, respectively evaluating the differences 
between two distributions in terms of sign and of distance, 
are complementary. A feature of the energy deposition profiles 
hinting at systematic differences is visible in the vicinity of 
the Bragg peak in Fig. [7] where alternative elastic scattering 
options appear associated with sequences of energy deposition 
consistently larger or smaller than those deriving from the 
configuration of Table ITT1 encompassing the "U-elastic" elastic 
scattering model. 

For the use case under study, the small differences exhibited 
by the various simulation models look compatible with the 
experimental resolution typical of the application domain 
(of the order of 2-2.5% [154]); therefore, the peculiarities 
of the models do not affect the outcome of the simulation 
significantly. Based on these results, one can conclude that at 
the present stage all the Geant4 elastic scattering options are 
equivalent for the use case considered in this study. Validation 
against experimental data concerning the energy range and 
target materials pertinent to this use case would strengthen 
the predictive reliability of the simulation. 



E. Hadronic inelastic cross sections 

The proton total inelastic cross sections are an important 
parameter in the simulation of therapeutical proton beams, 
since they determine the amount of proton loss from the 
primary beam. 



Two configurations of cross sections were 
evaluated in this study: those implemented in 
G4HadronInelasticDataSet, originating from GHEISHA, 
and those implemented in G4ProtonInelasticCwssSection 
and G4NeutronInelasticCrossSection, respectively for protons 
and neutrons, covering the energy range above 6.8 MeV. 
Apart from this feature, all the other physics options in the 
simulation were set as in Table HU 

Both cross sections derive from parameterisations of exper- 
imental data; it is not clear whether the comparisons available 
in the literature concern the calibration of the parameterisation 
with experimental data, or represent the cross section model 
validation. 

No significant dependence on the cross section options is 
observed regarding the proton acceptance in the sensitive water 
volume, which is affected by the interactions to which primary 
protons are subject in the materials of the beam line. 

The two sets of cross sections determine some difference 
in the occurrence of the proton inelastic scattering process 
associated with them in the sensitive water volume. Confidence 
intervals for this quantity were calculated, using Student's t 
distribution, based on the simulation sample activating the 
cross sections as in [94]. The 99% confidence interval for the 
mean value of hadronic inelastic scattering occurrences lies 
between 1688 and 1849, when one million primary protons 
are generated, while the number of occurrences with the 
GHEISHA-like cross section data set is 1654; this value 
is significantly different from the mean number of inelastic 
scattering occurrences determined by the cross sections of 
[94]. 

Nevertheless, the effect of this difference on the longitudinal 
energy deposition appears negligible. The distribution of the 
relative differences of the energy deposition profiles associated 
with the two options is shown in Fig. |9j it is consistent with 
typical experimental uncertainties in hadron therapy practice. 
The longitudinal energy deposition profiles resulting from the 
two cross section options, with other physics settings as in 
Table [Til are compatible at 90% confidence level according to 
all the aforementioned goodness-of-fit-tests. 

Therefore one can conclude that the characteristics of the 
two hadronic cross section data sets do not affect the Simula- 
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Fig. 9. Percent relative difference of the energy deposition profiles resulting 
from the activation of G4HadronInelasticDataSet hadronic inelastic cross sec- 
tions or G4ProtonlnelasticCrossSection and G4NeutronInelasticCrossSection; 
all the other simulation options are identical in the two cases, and set as in 
Table [TT] The energy deposition derives from one million primary protons 
with (E) = 63.95 MeV and cr = 300 keV incident on water. The relative 
difference is calculated in each slice of the longitudinal segmentation of the 
readout geometry associated with the sensitive volume. 



Fig. 10. Relative difference of longitudinal energy deposition profiles 
associated with the GEM evaporation model, with respect to the "reference 
physics configuration": simulation based on Geant4 9.3 (solid line) and 9.1 
(dashed line). The observed systematic effect is related to the correction of 
a software feature in Geant4 9.3. The energy deposition derives from one 
million primary protons with (E) = 63.95 MeV and a = 300 keV incident 
on water. The relative difference is calculated in each slice of the longitudinal 
segmentation of the readout geometry associated with the sensitive volume. 



tion of the proton depth dose profiles in the use case considered 
in this study. 

F. Hadronic inelastic scattering models 

Several alternative hadronic inelastic scattering modeling 
options were evaluated: the Precompound model, the Bertini 
and Liege cascade models, the LEP parameterised model and 
the CHIPS model. In addition, a few configuration options of 
the nuclear deexcitation phase, accessible through the interface 
of the G4ExcitationHandler class instantiated by the Precom- 
pound model, were evaluated together with the Precompound 
model: the generalized evaporation (GEM) model replacing 
the default evaporation one, and the activation of the Fermi 
break-up. The Precompound model was evaluated both as a 
standalone model and as invoked by the Binary cascade model. 

Proton and neutron interactions were handled consistently in 
each simulation configuration by activating the same hadronic 
inelastic model option for both particles; all the other physics 
features were set as in Table [TTJ 

The longitudinal energy deposition profiles produced by 
the various hadronic inelastic models in Geant4 9.3 appear 
visually undistinguishable; therefore no related figure is shown 
in this paper. 

The energy deposition profile produced with the GEM 
evaporation model closely resembles the one deriving from 
the default evaporation model instantiated in connection with 
the Precompound model, as seen in Fig. [TO] this observation 
is confirmed by the results of the goodness-of-fit tests in Table 
||V]with 0.1 significance. Differences related to the use of the 
two models were visible with previous Geant4 versions, as 
shown in Fig. QJJ the GEM implementation was subject to 
improvements in Geant4 9.3 [73]. 

The distributions of the secondary particles produced in 
association with the two evaporation models look consistent, 



compatible with statistical fluctuations; the secondary proton 
distributions are shown in Fig. [TT] as an example. The lack 
of visible effects does not necessarily mean that these two 
models are characterized by identical features; rather, it shows 
that the use case under study is not sensitive to their possible 
difference. 

From this analysis one can conclude that the evaporation 
model options are equivalent for the simulation of the longi- 
tudinal energy deposition; as documented in section IVI1 the 
GEM model is computationally faster than the default one in 
the application under study. 

The evaporation process of nuclear deexcitation is based 
on the hypothesis that the excitation energy is high and 
approximately equally distributed among the nucleons: this 
assumption is justified for heavy nuclei, but it is not applicable 
to the water target considered in this use case. It is generally 
accepted that the Fermi break-up represents a more appropriate 
theoretical description of the nuclear deexcitation process for 
light nuclei: in MCNPX and FLUKA the Fermi break-up 
is applied to nuclei with atomic mass up to 17, whereas 
evaporation is applied to heavier nuclei; in Geant4 it is not 
invoked by default in the deexcitation of light nuclei, although 
the public interface of G4ExcitationHandler allows users to 
modify the default settings. 

The effects of the Fermi break-up were evaluated by activat- 
ing it in the simulation for nuclei with atomic number smaller 
than 10; they are visible in the spectrum of the produced 
secondaries, with respect to those produced by nuclear de- 
excitation proceeding through evaporation. From a theoretical 
perspective, the application of an evaporation model to the 
deexcitation of light nuclei is expected to overestimate the 
production of secondary protons in the lower energy range; 
this effect is indeed observed in Fig. [TT] The activation of the 
Fermi break up affects the longitudinal energy deposition, with 
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Fig. 11, Energy spectrum of secondary protons produced with different 
configurations of the Precompound model: default configuration as in Ta- 
ble |n] (black circles), configuration with GEM evaporation (red squares), 
configuration activating Fermi break up (blue triangles) and configuration 
activating the Binary Cascade model (white crosses), which in turn invokes 
the Precompound model to handle the preequilibrium phase. The secondary 
spectra derive from one million primary protons with (E) = 63.95 MeV and 
a = 300 keV incident on water. 



Fig. 13. Relative difference of the longitudinal energy deposition profile 
deriving the activation of the Precompound model either as a standalone 
model, or as invoked by the Binary cascade model. The energy deposition 
derives from one million primary protons with (E) = 63.95 MeV and 
a = 300 keV incident on water. The relative difference is calculated in each 
slice of the longitudinal segmentation of the readout geometry associated with 
the sensitive water volume. 
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Fig. 12. Relative difference of the longitudinal energy deposition profile 
deriving from the activation of the Fermi break-up, with respect to the profile 
deriving from the default configuration of the Precompound model; all the 
other settings are as in Table [TT] The energy deposition derives from one 
million primary protons with (E) = 63.95 MeV and cr = 300 keV incident 
on water. The relative difference is calculated in each slice of the longitudinal 
segmentation of the readout geometry associated with the sensitive volume. 



respect to that resulting from the default nuclear deexcitation 
settings: their relative difference, shown in Fig.[T2] exhibits an 
asymmetric distribution shifted towards negative values. This 
effect hints at a systematic contribution of the Fermi break-up 
to decrease the longitudinal energy deposition; nevertheless, 
the observed differences are consistent with typical uncertain- 
ties in experimental proton therapy practice. 

A similar asymmetry in the longitudinal energy deposition 
difference is observed when the preequilibrium phase is associ- 
ated with intranuclear transport; this effect is shown in Fig.[T3l 
which concerns two simulations involving the Precompound 
model, respectively as an independent model and invoked 



through the Binary Cascade model. The transition between the 
cascade process of intranuclear transport and the preequilib- 
rium is determined by empirical considerations [103], which 
are specific to each software implementation: for instance, in 
Geant4 Binary cascade model cascading continues as long as 
there are particles above a 70 MeV kinetic energy threshold 
[103] (along with other conditions required by the algorithm), 
while a smooth transition around 50 MeV is implemented 
in FLUKA [10]. The empirical features of the algorithm 
correspond to lack of knowledge from physical principles to 
determine the transition between the two regimes; the analysis 
shows that this epistemic uncertainty can be a source of 
systematic effects, such as the bias of the distribution in Fig. 
IT3l This effect, which is of a magnitude comparable to typical 
experimental uncertainties in hadron therapy measurements, 
could be significant in use cases where precise predictive 
power is expected from the Monte Carlo simulation. 

No such asymmetries, with respect to simulating the pree- 
quilibrium phase only (as in the Precompound model), are 
observed with two other configurations involving intranuclear 
cascade models - the Bertini and Liege ones. It is worth 
remarking that the Liege model does not involve a preequilib- 
rium phase at all, while the Bertini cascade model does. The 
apparent absence of consistent trends related to the adopted 
physical approach (modeling intranuclear cascade, preequilib- 
rium and their interplay) suggests that the observed behavior of 
the code may be influenced by other implementation details on 
top of the basic physics modeling approach; this consideration 
adds further complexity to the effort of identifying the sources 
of epistemic uncertainty in the simulation, which is a necessary 
step towards their reduction or their control. 

The relative differences of the energy deposition profiles 
concerning other hadronic inelastic models with respect to the 
outcome deriving from the Precompound one are illustrated in 
Fig. M 
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Fig. 14. Relative difference of the energy deposited in the longitudinal 
slices of the sensitive water volume associated with various hadronic inelastic 
options, with respect to the configuration of Table [TT] encompassing the 
Precompound model: Bertini (solid black histogram), LEP (blue dashed 
histogram), Liege (green dot-dashed histogram) and CHIPS (red dotted 
histogram). The energy deposition derives from one million primary protons 
with (E) = 63.95 MeV and a = 300 keV incident on water. 



All the final state models of hadronic inelastic scattering 
produce statistically compatible results at 90% confidence 
level in the considered use case, as shown in Table II VI 

The Wald-Wolfowitz test concerning the difference with re- 
spect to the reference physics configuration results in a p-value 
smaller than 0.001 for all the considered modeling options, 
with the exception of the comparison involving the Liege 
cascade model, for which the p-value is 0.360. These results 
suggest the presence of some systematic effects due to the 
choice of the hadronic inelastic models in the simulation; some 
asymmetries and bias with respect to zero are indeed visible in 
the distributions in Fig. [14] namely the one concerning the LEP 
inelastic model, apart from the previously discussed effects in 
Fig. H2 and [13] 

Like the results discussed in section |V-D"1 this result suggests 
that the selection of the hadronic inelastic model activated 
in the simulation can be source of systematic effects. Nev- 
ertheless, the differences concerning the longitudinal energy 
deposition patterns appear compatible with typical experi- 
mental uncertainties in proton therapy dosimetry; therefore 
the systematic effects identified by the Wald-Wolfowitz test 
would be negligible in that software application context. They 
could become relevant in use cases where higher accuracy is 
demanded. 

The implementation of the Geant4 Precompound model was 
subject to improvements [38] in Geant4 9.2. Nevertheless, 
these modifications do not appear to have affected the features 
of the longitudinal energy deposition pattern significantly, 
since the goodness-of-fit tests in Table II V I show that the energy 
deposition profiles associated with this model were compatible 
with those deriving from other hadronic inelastic models in 
previous Geant4 versions, as well as in the 9.3 one. This 
remark is relevant to previous applications of the Precompound 
model to the use case under study, which are archived in the 
literature. 



Fig. 15. Secondary proton spectrum resulting from various hadronic inelastic 
models: Precompound (black circles), Bertini (red squares), LEP (white 
crosses), Liege (blue triangles) and CHIPS (green stars). The secondary 
particle spectrum derives from one million primary protons with(E) = 63.95 
MeV and a = 300 keV incident on water. 



Nevertheless, despite their similarity at determining the 
longitudinal energy deposition profile, some hadronic inelastic 
models exhibit very different characteristics regarding the sec- 
ondary particles they generate: the secondary proton, neutron 
and a particle spectra are shown in Fig. Q3] to [T7] Radiotherapy 
applications can be affected by secondary particles within the 
target volume and outside it, both laterally and beyond the 
distal edge of the Bragg peak [155]; concerns for the risks 
due to secondary particles in proton therapy are discussed 
in the literature [173]. The analysis documented in the pre- 
vious paragraphs shows that the different secondary particle 
production patterns do not produce significant effects on the 
longitudinal energy deposition profile; the quantification of 
possible effects on the lateral energy deposition pattern would 
require substantially larger data samples, which were not 
achievable with the limited computational resources available 
to the authors in the course of the project documented in this 
publication, and is outside the scope of this paper. 

The identification of actual systematic effects related to 
hadronic inelastic models, and their quantitative estimate, 
would require experimental measurements with adequate ac- 
curacy to discriminate not only the features of the energy 
deposition distribution, but also the characteristics of the 
secondary particles they produce. 

G. Multiple Coulomb scattering 

The configuration of proton multiple scattering simulation 
in a Geant4 application involves the selection of the multiple 
scattering process and models to be activated, and setting 
some parameters used by the multiple scattering algorithm. 
Default options are provided in Geant4 kernel for the model 
and parameters associated to multiple scattering processes; 
they are summarized in Table [V] for a set of recent Geant4 
versions. 

The analysis evaluated whether recent evolutions of the 
code between Geant4 8.1 and 9.3 versions, which involve 



16 



TABLE IV 
P- VALUE OF GOODNESS-OF-FIT TESTS COMPARING HADRONIC INELASTIC SCATTERING OPTIONS 



Geant4 


Test 


Hadronic 


Kolmogorov 


Anderson 


Cramer 


version 


range 


model 


Smirnov 


Darling 


von Mises 






Bertini 


1 


1 


1 






LEP 


0.954 


0.988 


0.984 




Whole 


Liege 


1 


1 


1 






CHIPS 


1 


1 


I 


9.3 




GEM 


1 


1 


I 






Fermi break-up 


1 


1 


1 






Binary 


0.954 


0.938 


0.973 




Bertini 


1 


1 


1 




Left 


LEP 


0.945 


0.961 


0.979 




branch 


Liege 


1 


1 


1 






CHIPS 


1 


1 


1 






GEM 


1 


1 


1 






Fermi break-up 


1 


1 


1 






Binary 


0.945 


0.858 


0.962 




Bertini 


1 


0.986 


0.972 




Right 


LEP 


1 


0.986 


0.972 




branch 


Liege 


1 


0.986 


0.972 






CHIPS 


1 


0.986 


0.972 






GEM 


1 


0.986 


0.972 






Fermi break-up 


1 


0.986 


0.972 






Binary 


1 


0.986 


0.972 


9.1 


Left 


Bertini 


0.981 


0.901 


0.980 




branch 


LEP 


0.945 


0.949 


0.937 


8.1 


Left 


Bertini 


1 


1 


1 




branch 


LEP 


0.996 


0.814 


0.847 



TABLE V 
DEFAULT SETTINGS OF RELEVANT MULTIPLE SCATTERING PROCESSES 



Geant4 
version 


Process 


Range 
Factor 


Step 
Limit 


Lateral 
Displacement 


skin 


Geometry 
Factor 


Model 


8.1 
9.1 
9.2p0.3 
9.3 
9.3 


G4MultipleScattering 
G4MultipleScattering 
G4MultipleScattering 
G4MultipleScattering 
G4hMultipleScattering 


0.02 
0.02 
0.02 

0.04 
0.2 


1 
1 
1 
1 



1 
1 
1 
1 




3 
3 
3 


2.5 
2.5 
2.5 
2.5 


Urban 

Urban 

Urban 

Urban92 

Urban90 



different model and parameter settings, could be the source 
of systematic effects in Geant4-based simulations for proton 
therapy applications, originating from epistemic uncertain- 
ties in the simulation model. Two issues were addressed: 
the effects related to different multiple scattering processes, 
G4MultipleScattering and G4hMultipleScattering, and those 
due to changes in the G4MultipleScattering process since the 
8.1 release. It should be remarked that in the following analysis 
the behaviour associated with multiple scattering may result 
not only directly from the implementation of the two above 
mentioned classes, but also from behavior inherited from their 
base classes or acquired through aggregation of, or dependence 
on, other classes, as determined by the software design of 
Geant4 multiple scattering domain. The results are reported 
for Geant4 versions 8.1p02, 9.1, 9.2p03, and 9.3. 

Two multiple scattering processes, G4hMultipleScattering 
and G4MultipleScattering, are applicable to protons in Geant4 
9.3. The former was first introduced in Geant4 8.2 to pro- 
vide faster simulation of hadron transport; the latter com- 
plies with an earlier class interface and is planned to be 
withdrawn from later releases. In Geant4 9.3 the com- 
mon base class G4VMultipleScattering accounts for public 
member functions formerly specific to G4MultipleScattering. 
G4hMultipleScattering can be configured to acquire equiva- 



lent behavior to G4MultipleScattering by applying the same 
settings (model and parameters) as in G4MultipleScattering 
listed in Table [V] For convenience, the configuration of 
G4hMultipleScattering equivalent to G4MultipleScattering is 
still indicated as G4MultipleScattering in the following. 

Only the default settings listed in Table [V] were tested; 
the large effects related to epistemic uncertainties observed 
in this limited interval analysis, which are documented in the 
following, suggest that this complex problem domain would 
benefit from a larger-scale dedicated study beyond the scope 
of this paper. 

To acquire sound evidence of effects associated with multi- 
ple scattering settings, the comparisons were performed over 
five physics configurations: the set of processes and models 
(apart from multiple scattering) as in Table [Til and variants 
of it consisting of "LEP" and "Bertini" inelastic scattering 
together with "U-elastic" elastic scattering, and "LEP" and 
"Bertini" elastic scattering together with the Precompound 
hadronic inelastic model. Common effects observed in such 
an extended set of test cases could be reasonably associated 
with the multiple scattering domain, excluding their possible 
origin from intrinsic features of a single physics configuration. 

The resulting longitudinal energy deposition distributions 
associated with proton multiple scattering options are shown 
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Fig. 16. Secondary neutron spectrum resulting from various hadronic 
inelastic models: Precompound (black circles), Bertini (red squares), LEP 
(white crosses), Liege (blue triangles) and CHIPS (green stars). The secondary 
particle spectrum derives from one million primary protons with (E) = 63.95 
MeV and a = 300 keV incident on water. 
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Fig. 17. Secondary a particle spectrum resulting from various hadronic 
inelastic models: Precompound (black circles), Bertini (red squares), LEP 
(white crosses) and CHIPS (green stars); no a particles appear to be produced 
by the Geant4 implementation of the Liege model. The secondary particle 
spectrum derives from one million primary protons with (E) = 63.95 MeV 
and a = 300 keV incident on water. 



in Fig. [TU The two multiple scattering processes produce 
visibly different longitudinal energy deposition profiles; the 
extent of the differences can be quantitatively appraised in 
Fig. [19] which shows the variation of the longitudinal energy 
deposition profiles simulated with G4hMultipleScattering in 
Geant4 9.3 with respect to equivalent simulations performed 
with G4MultipleScattering in Geant4 9.3, 9.1 and 8.1p02. The 
plot shows results produced with the Bertini hadronic inelastic 
option; similar results are obtained with the other physics 
configurations subject to comparison. The energy deposition 
profile of Geant4 9.2p03 is not shown in Fig. [19] to avoid 
clogging the plot with many curves; it lies in between the 
profiles produced by Geant4 9.3 with G4hMultipleScattering 
and Geant4 9.1. 

The results of goodness-of-fit tests comparing longitudinal 
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Fig. 18. Bragg peak profile resulting from different multiple scattering 
processes and Geant4 versions: G4hMultipleScattering (black, thick solid 
line) in Geant4 9.3, G4MultipleScattering in Geant4 9.3 (dashed, green line), 
Geant4 9.2p03 (pink, thin solid line), Geant4 9.1 (dotted, blue line) and Geant4 
8.1p02 (dash-dotted, red line) The same physics configuration was activated in 
all the simulations, apart from the multiple scattering setting. The Bragg peak 
is from one million primary protons with (E) = 63.95 MeV and cr = 300 
keV incident on water; the plot shows the deposited energy in each slice of the 
longitudinally segmented readout geometry, integrated over all the generated 
events. 
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Fig. 19. Relative difference of the energy deposited in the longitudinal slices 
of the sensitive water volume associated with different multiple scattering 
processes and Geant4 versions; the difference is calculated in each longitudinal 
slice of the readout geometry with respect to a reference configuration 
with G4hMuhipleScattering in Geant4 9.3 for identical configurations with 
G4MultipleScattering in Geant4 9.3 (solid black histogram), 9.1 (dashed 
blue histogram), 9.2p03 (dotted pink histogram) and 8.1p02 (dash-dotted red 
histogram) versions. The reference configuration is as in Table fill except for 
the hadronic inelastic scattering option (Bertini instead of Precompound); this 
replacement is due to the greater stability of the Bertini code across the various 
Geant4 versions, nevertheless all other physics configurations produce similar 
results. The energy deposition derives from one million primary protons with 
(E) = 63.95 MeV and a = 300 keV incident on water. 
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energy deposition distributions associated with either proton 
multiple scattering process are summarized in Table [Vl] The 
longitudinal energy deposition distributions associated with 
G4hMultipleScattering are incompatible at 99.9% confidence 
level with those produced by Geant4 versions 8.1p02 and 
9.3 with G4MultipleScattering, and, apart from one test case 
involving the Kolmogorov-Smirnov test, at 99% confidence 
level with those produced by Geant4 9.1. Regarding the com- 
parison with the profiles generated with Geant4 9.2p03, the 
Anderson-Darling test rejects the hypothesis of compatibility 
with the profiles produced with G4hMultipleScatteringat 95% 
confidence level, the Kolmogorov-Smirnov does not reject 
it, while the Cramer-von Mises rejects it in two physics 
configurations and does not reject it in the other three configu- 
ration. The Anderson-Darling and Cramer-von Mises tests are 
considered more powerful than the Kolmogorov-Smirnov test; 
the Anderson-Darling test is especially sensitive to fat tails 
[176]. G4MultipleScattering was responsible of the multiple 
scattering process in these earlier Geant4 versions. 

It is worth mentioning that [35] shows comparisons of 
experimental and simulated proton energy deposition pro- 
files normalized to the number of protons in the beam; the 
reported simulations were performed with Geant4 8.1p01 
version. The small plots in logarithmic scale and the limi- 
tation of the comparisons to qualitative appraisal prevent the 
reader from understanding whether the different behaviour of 
G4hMultipleScattering, and of the G4MultipleScattering class 
in later Geant4 versions, with respect to the multiple scattering 
implementation of Geant4 8.1, would affect the compatibility 
with the experimental data of [35]. 

Visible differences are also observed in Fig. [20] con- 
cerning the energy deposition profiles associated with 
G4MultipleScattering settings over the various versions. The 
total energy deposition shown in Fig. [2J] exhibits evident 
differences associated with the various settings. 

The Geant4 Low Energy Electromagnetic package, used in 
all the simulations, was subject to configuration and Change 
Management discipline [174] based on the Unified Software 
Development Process [175] framework until Geant4 release 
9.1; the adopted software process ensured that the software 
of this package relevant to the use case under study did 
not undergo modifications between the 8.1p02 and 9.1 pro- 
duction versions, which could alter its physical behaviour. 
The same implementations of the low energy electromagnetic 
processes were used in the simulations based on Geant4 9.2 
and 9.3; therefore, it is plausible that variations observed in 
the simulation productions based on different Geant4 versions 
are associated with evolutions in other Geant4 domains. The 
extent of the differences observed when comparing two Geant4 
versions appears to be approximately the same over all the 
hadronic physics configurations activated in the simulation; 
since the occurrence of coherent modifications to all the 
Geant4 hadronic elastic and inelastic scattering implemen- 
tations is not likely, this observation suggests that coherent 
differences would derive from modifications either to Geant4 
transport kernel or to the multiple scattering domain, which 
are common to all the simulations. Major changes to Geant4 
kernel have not been documented over the considered versions; 
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Fig. 20. Relative difference of the energy deposited in the longitudinal slices 
of the sensitive water volume associated with the G4MultipleScattering in 
Geant4 9.2p03, with respect to identical simulation settings in Geant4 9.3 
(solid black histogram), Geant4 9.1 (dashed blue histogram), and Geant4 
8.1p02 (dotted red histogram). The reference configuration is as in Table 
ITT1 except for the hadronic inelastic scattering option (Bertini instead of 
Precompound); this replacement is due to the greater stability of the Bertini 
code across the various Geant4 versions, nevertheless all other physics 
configurations produce similar results. The energy deposition derives from 
one million primary protons with (E) = 63.95 MeV and a = 300 keV 
incident on water. 



the multiple scattering domain, which was subject to evolution, 
appears the most likely source of the observed discrepancies. 
Their origin is probably from epistemic uncertainties in the 
simulation models, whose validation in the energy range 
relevant to this use case is scarcely documented in literature. 

The differences concerning multiple scattering settings in 
the various Geant4 versions are significant. The 99.9% and 
99% confidence intervals for the mean value of the total energy 
deposition deriving from G4hMultipleScattering in Geant4 9.3 
are shown in Fig.[2T] the values deriving from Geant4 versions 
8.1p02, 9.1 and 9.2p03 fall outside the 99.9% confidence 
interval. 

The results of goodness-of-fit tests are reported in Table 
I VI I In most test cases the longitudinal energy deposition 
distributions produced with Geant4 9.3 are incompatible with 
those produced with Geant4 9.1 at 95% confidence level; 
in a few cases the test statistic results in p-values close to 
the critical region for 0.05 significance. The null hypoth- 
esis of equivalence of the distributions subject to test is 
not rejected, with the same significance, in the comparisons 
involving Geant4 9.3 and 8.1p02 versions. This quantitative 
result is consistent with the qualitative appraisal of Fig. [18] 
where the energy deposition profile deriving from Geant4 9.3 
appears closer to the one produced with the earlier 8.1p02 
version. The energy deposition profiles produced with Geant4 
9.2p03 are incompatible with those produced with Geant4 
9.3 (with G4MultipleScattering) and Geant4 8.1p02 with 0.05 
significance, while the goodness-of-fit tests fail to reject the 
hypothesis of compatibility with the profiles produced with 
Geant4 9.1 with 0.05 significance. The longitudinal energy 
deposition distributions produced with Geant4 9.1 and 8.1p02 
are incompatible with 0.05 significance. 
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Fig. 21. Total energy deposition in the sensitive volume associated with 
various Geant4 versions and physics configurations: Geant4 9.3 (red squares), 
Geant4 9.2p()3 (pink diamonds), Geant4 9.1 (blue circles) and Geant4 8.1p02 
(green triangles); the filled symbols correspond to simulations activating 
the G4MultipleScattering multiple scattering process, while the empty ones 
correspond to the activation of G4hMultipleScattering. The upper and lower 
lines of the horizontal axis identify respectively the hadronic elastic and 
inelastic scattering model in each simulation configuration; the other physics 
options, apart from the multiple scattering under test, were as in Table ITT1 The 
dashed and dotted lines represent respectively the 99.9% and 99% confidence 
intervals for the mean value of the total deposited energy over various 
Geant4 9.3 physics configurations associated with G4hMultipleScattering. 
The total energy deposition derives from one million primary protons with 
(E) = 63.95 MeV and a = 300 keV incident on water. 
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Fig. 22. Average energy deposition per proton entering the sensitive volume 
calculated with various Geant4 versions and physics configurations: Geant4 
9.3 (red squares), Geant4 9.2p03 (pink diamonds), Geant4 9.1 (blue circles) 
and Geant4 8.1p02 (green triangles); the filled symbols correspond to simu- 
lations activating the G4MuhipleScattering multiple scattering process, while 
the empty ones correspond to the activation of G4hMultipleScattering. The 
upper and lower lines of the horizontal axis identify respectively the hadronic 
elastic and inelastic scattering model in each simulation configuration; the 
other physics options, apart from the multiple scattering under test, were 
as in Table [TT] The dotted lines represent the 95% confidence interval for 
the mean value of the various Geant4 9.3 physics configurations associated 
with G4hMultipleScattering. The average energy deposition derives from one 
million primary protons with (E) = 63.95 MeV and a = 300 keV incident 
on water. 



The simulations with the two multiple scattering processes 
and with different Geant4 versions produce a significantly 
different total energy deposition in the sensitive volume. 
The results are shown in Fig. [2TJ the dashed and dotted 
lines in the plot represent respectively the 99.9% and 99% 
confidence intervals for the average energy deposition with 
G4hMultipleScattering over all Geant4 9.3 physics configura- 
tions. 

The absolute value of the energy deposition is relevant to 
applications where knowledge of the actual dose released to a 
target is critical, like oncological treatment planning, radiation 
protection or radiation damage estimate. The observed differ- 
ences would be significant in use cases where the simulation 
has a predictive role: differences greater than 10% in the dose 
released to a patient, like the effects observed with the various 
multiple scattering implementations released in Geant4, would 
be important in clinical applications. These use cases would 
not be limited to the bio-medical application domain; for 
instance, the use of Monte Carlo simulation to study the 
damage to electronic components exposed to radiation would 
require precise estimate of the released dose. 

The average energy deposition per proton in the sensitive 
volume, and the ratio between the energy deposited at the peak 
location and at the entrance of the sensitive volume are approx- 
imately the same for all the physics configurations and Geant4 
versions, as illustrated in Fig. [22] and [23] The 95% confidence 
interval for the mean value deriving from Geant4 9.3 with 
G4hMultipleScattering is shown in the figures to appreciate 
quantitatively the spread of the results. These observations 



suggest that the detailed features of the energy deposition in 
the water volume are insensitive to the physics options selected 
in the simulation, including multiple scattering, and to the 
evolutions of Geant4 software. 

The acceptance, i.e. the fraction of protons reaching the 
sensitive volume, out of all the primary generated ones, is 
plotted in Fig. [24] for different physics configurations and 
Geant4 versions. Various sources can affect it: inelastic nuclear 
reactions, which remove protons from the beam prior to 
reaching the sensitive volume, and nuclear elastic and multiple 
Coulomb scattering, which modify the protons' direction along 
with their passage through matter. 

The acceptance appears roughly constant in Fig. [24] for 
the various hadronic models, within the set of simulations 
associated with a given multiple scattering option and Geant4 
version; therefore, the features of these models can be ex- 
cluded as a source of significant differences. Complementary 
tests, whose results are not reported in Fig. [24] show that the 
acceptance is not significantly sensitive to alternative stopping 
power models either. The multiple scattering algorithm appears 
the most probable source of the observed differences. 

Fig. [2j] and [24] suggest a correlation between the total 
energy deposited in the sensitive volume and the acceptance. 
This effect was evaluated by means of Pearson's correlation 
coefficient [177]; the null hypothesis consists of assuming no 
correlation between these quantities. The correlation coeffi- 
cient, calculated over all physics configurations and Geant4 
versions examined in this study, is 0.965; the null hypothesis 
is rejected with 0.0001 significance. On the other hand, no 
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Fig. 23. Ratio of the energy deposition at the Bragg peak location and at 
the entrance of the sensitive volume, deriving from various Geant4 versions 
and physics configurations: Geant4 9.3 (red squares), Geant4 9.2p()3 (pink 
diamonds), Geant4 9.1 (blue circles) and Geant4 8.1p()2 (green triangles); the 
filled symbols correspond to simulations activating the G4MultipleScattering 
multiple scattering process, while the empty ones correspond to the activation 
of G4hMultipleScat1ering. The upper and lower lines of the horizontal axis 
identify respectively the hadronic elastic and inelastic scattering model in each 
simulation configuration; the other physics options, apart from the multiple 
scattering under test, were as in Table ITT1 The dashed lines represent the 95% 
confidence interval for the mean value of the various Geant4 9.3 physics 
configurations associated with G4hMultipleScattering. The energy deposition 
derives from one million primary protons with (E) = 63.95 MeV and a = 
300 keV incident on water. 



correlation of the total energy deposition is observed with the 
average energy deposited per proton, nor with the peak over 
entrance ratio: the corresponding correlation coefficients are 
0.120 and 0.151; these values lead to not rejecting the null 
hypothesis with 0.1 significance. 

These results hint that the observed discrepancies in the 
longitudinal energy deposit distributions are related to effects 
due to multiple scattering in the beam line, rather than to 
physics modeling effects in the water volume. Geant4 multi- 
ple scattering implementation encompasses various empirical 
parameters [74], whose settings are characterized by epistemic 
uncertainties; presumably, the observed effects are associated 
with different angular distribution (including backscattering) 
and lateral displacement of the scattered particle implemented 
in the various Geant4 multiple scattering options and versions, 
and the variations of empirical parameters governing the 
algorithm. 

This finding stresses the importance of accurately modeling 
the beam line geometry and material composition for accurate 
calculation of the energy deposited in the sensitive volume. It 
also highlights the importance of correctly simulating particle 
interactions not only in the sensitive parts of the experimental 
set-up, but also in its passive components, since the latter 
appear to be responsible for significant systematic effects on 
the energy deposited in the sensitive volume. 

In hadron therapy practice, proton depth dose profiles are 
usually normalized to a reference value (at the peak or at 
the entrance of the sensitive volume), or to the integral of 



Fig. 24. Percentage of primary protons (acceptance) reaching the sensitive 
volume deriving from various Geant4 versions and physics configurations: 
Geant4 9.3 (red squares), Geant4 9.2p03 (pink diamonds), Geant4 9.1 (blue 
circles) and Geant4 8.1p02 (green triangles); the filled symbols correspond to 
simulations activating the G4MultipleScattering multiple scattering process, 
while the empty ones correspond to the activation of G4hMultipleScattering. 
The upper and lower lines of the horizontal axis identify respectively the 
hadronic elastic and inelastic scattering model in each simulation configu- 
ration; the other physics options, apart from the multiple scattering under 
test, were as in Table [TT1 The dashed and dotted lines represent respectively 
the 99.9% and 99% confidence intervals for the mean value of the various 
Geant4 9.3 physics configurations of associated with G4hMultipleScattering. 
The simulation involves from one million primary protons with (E) = 63.95 
MeV and a = 300 keV incident on water. 



the dose, as discussed in [44]. The same goodness-of-fit tests 
reported in Table |VT] were performed after normalizing the 
energy deposition profiles to the total energy collected in 
the sensitive volume: they failed to reject the null hypothesis 
of compatibility with 0. 1 significance, which was rejected in 
the comparison of the original (non normalized) distributions. 
This analysis demonstrates that normalized distributions are 
insensitive to the large differences exhibited by the various 
models on an absolute scale. Therefore, comparisons like the 
one in Fig. 7 of [67], concerning an experimental Bragg peak 
and one simulated with Geant4 9.0, both normalized to 1, are 
of limited usefulness to clarify the issues that emerged in the 
previous analysis. 

Further tests were performed, activating the specific 
G4eMultipleScattering process for electron multiple scattering, 
released in Geant4 9.3; no effects were observed on the 
longitudinal energy deposition. 

The authors found only limited documentation in the liter- 
ature concerning the experimental validation of proton mul- 
tiple scattering implementations in recent Geant4 versions; 
the comparisons with experimental data reported in [178] 
concern muons and electrons and are not pertinent to the 
use case object of this investigation, which concerns protons. 
Therefore, this process is a source of epistemic uncertainty in 
the simulation; the analysis described in this paper shows that 
this uncertainty could determine large systematic effects in 
critical use cases. Further experimental measurements would 
be useful for the validation of Geant4 multiple scattering 
in use cases similar to the one considered in this paper; in 
particular, experimental data suitable to clarify the interplay 
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between energy deposition measurements and features of the 
multiple scattering algorithm would be beneficial. Ideally, 
the experiment should be able to measure effects related 
to backscattering and lateral displacement, which could be 
responsible for the discrepancies in the proton acceptance ob- 
served with the various algorithm implementations examined 
in this paper. 

VI. Computational performance 

The extensive survey of physics models and parameters 
relevant to the problem domain documented in the previous 
sections provides guidance for Geant4-based simulations con- 
cerning similar use cases. The computational performance of 
the available physics options is a relevant parameter in simu- 
lation applications, especially considering that some previous 
analyses demonstrate the equivalence of some of them regard- 
ing the physical features they produce. Therefore the analysis 
is complemented here by some information on the associated 
computational performance in the use case described in this 
paper, which can be useful to experimentalists in their Geant4- 
based applications. 

The results reported in Table IVIII related to Geant4 9.3 
version, show the average simulation time per primary gener- 
ated event in each physics configuration; they derive from the 
productions for the analysis described in the previous sections. 
The content of Table IVIII should not be considered as mea- 
surements of Geant4 computational performance in absolute 
terms: the application code contained analysis features, such as 
filling a large number of histograms, which added an additional 
burden to the execution with respect to the time strictly needed 
for particle transport; moreover no effort was invested in 
the optimization of the application code. However, since all 
the simulations reported in that table were run on identical 
hardware and platforms, the measured execution times are 
interesting for relative comparisons of the computational per- 
formance of the various physics configurations in the use case 
object of this study. 

The results reported in Table IVIII involve the ICRU49 
proton stopping power model; simulations involving the 
Ziegler77, Ziegler85 and Ziegler2000 models are slightly 
slower. Simulations involving G4hMultipleScattering require 
approximately 5% more CPU time than those involving 
G4MultipleScattering; however, the larger acceptance asso- 
ciated with this multiple scattering model requires longer 
computations to track a greater number of particles in the 
sensitive volume. 

It is worth remarking that accounting for nuclear interac- 
tions in the simulation application described in this paper in- 
creases the computational time consumption by approximately 
57%, with respect to considering electromagnetic interactions 
only. 

Based on Table IVIII one can observe that the hadronic 
elastic scattering models exhibit similar computational per- 
formance, with the exception of the CHIPS model, which 
is significantly slower; among the hadronic inelastic models, 
the Liege cascade and the LEP ones are faster than the other 
options. 



TABLE VII 

AVERGE CPU TIME PER PRIMARY GENERATED EVENT IN VARIOUS 

PHYSICS CONFIGURATIONS 



Hadronic elastic 


Hadronic inelastic 


CPU time (ms) 


Bertini-elastic 


Precompound 


254.0 ± 0.3 


LEP 


Precompound 


255.1 ± 0.3 


CHIPS-elastic 


Precompound 


293.3 ± 0.3 


U-elastic 


Precompound 


254.3 ± 0.3 


U-elastic 


Precompound-GEM 


251.1 ± 0.3 


U-elastic 


Precompound-Fermi break-up 


255.8 ± 0.3 


U-elastic 


Binary cascade 


261.3 ± 0.3 


U-elastic 


Bertini cascade 


251.7 ± 0.3 


U-elastic 


Liege cascade 


223.1 ± 0.2 


U-elastic 


LEP 


225.4 ± 0.2 


U-elastic 


CHIPS-inelastic 


250.4 ± 0.3 



VII. Conclusion 

A number of epistemic uncertainties have been identified in 
a survey of Geant4 physics models pertinent to the simulation 
of proton depth dose, which broadly represent the variety of 
approaches to describe proton interactions with matter in the 
energy range up to approximately 100 MeV 

In the electromagnetic domain, the epistemic uncertainties 
affecting the value of the water mean ionization potential and 
proton stopping powers derive from lack of consensus among 
various theoretical and experimental references documented 
in the literature; they generate significant systematic effects 
on the longitudinal pattern of energy deposit in the sensitive 
volume, namely on the depth of the Bragg peak. 

The epistemic uncertainties affecting the hadronic compo- 
nents of the simulation are related to the intrinsic differences 
of the modeling approaches and empirical parameters they 
contain; the limited validation of the models, and the unclear 
distinction between the processes of calibration and validation 
in the few published comparisons with experimental data, are 
the main sources of such uncertainties. Their effects on the 
longitudinal energy deposit are comparable with experimental 
uncertainties typical of proton therapy; the largest differences 
concern secondary particle spectra. A significant effect was 
observed in relation to the mode of nuclear deexcitation; in 
this respect, there is a consensus towards modeling it through 
Fermi break-up for light nuclei and evaporation for heavier 
ones. This approach is implemented in some Monte Carlo 
codes (e.g. MCNP and FLUKA), while it is not adopted 
by default in Geant4; users of this code would benefit from 
implementing appropriate settings in their Geant4-based appli- 
cations to activate Fermi break-up for the deexcitation of light 
nuclei, if their simulation use cases are prone to be affected 
by the systematic effects highlighted in this study. 

The analysis shows how the sensitivity of the simulation 
to epistemic uncertainties cannot be determined in absolute 
terms, rather it depends on the experimental application envi- 
ronment. The relatively large differences in the Bragg peak 
profile associated with the set of electromagnetic options 
are practically irrelevant in clinical practice, which tolerates 
adjustments of the beam parameters to reproduce a refer- 
ence proton range. However, these differences are relevant 
to applications where a predictive role is expected from the 
simulation, such as Monte Carlo based treatment planning 
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TABLE VI 

P- VALUE OF GOODNESS-OF-FIT TESTS COMPARING LONGITUDINAL ENERGY DEPOSITION PROFILES DERIVING FROM VARIOUS GEANT4 VERSIONS AND 

PHYSICS CONFIGURATIONS 



Compared 


Physics 


Kolmogorov 


Anderson 


Cramer 


software versions 


configuration 


Smirnov 


Darling 


von Mises 


9.3 (G4hMultipleScattering) 


Uelastic Bertini 


< 0.001 


< 0.001 


< 0.001 


9.3 (G4MultipleScattering) 


Uelastic LEP 


< 0.001 


< 0.001 


< 0.001 




Uelastic Precompound 


< 0.001 


< 0.001 


< 0.001 




Bertini Precompound 


< 0.001 


< 0.001 


< 0.001 




LEP Precompound 


< 0.001 


< 0.001 


< 0.001 


9.3 (G4hMultipleScattering) 


Uelastic Bertini 


0.219 


0.046 


0.110 


9.2 (G4MultipleScattering) 


Uelastic LEP 


0.074 


0.013 


0.047 




Uelastic Precompound 


0.170 


0.037 


0.111 




Bertini Precompound 


0.054 


0.012 


0.044 




LEP Precompound 


0.098 


0.019 


0.057 


9.3 (G4hMultipleScattering) 


Uelastic Bertini 


0.009 


0.001 


0.004 


9.1 (G4MultipleScattering) 


Uelastic LEP 


0.002 


< 0.001 


0.002 




Uelastic Precompound 


0.014 


0.002 


0.008 




Bertini Precompound 


0.004 


< 0.001 


0.002 




LEP Precompound 


0.006 


0.001 


0.006 


9.3 (G4hMultipleScattering) 


Uelastic Bertini 


< 0.001 


< 0.001 


< 0.001 


8.1 (G4MultipleScattering) 


Uelastic LEP 


< 0.001 


< 0.001 


< 0.001 




Uelastic Precompound 


< 0.001 


< 0.001 


< 0.001 




Bertini Precompound 


< 0.001 


< 0.001 


< 0.001 




LEP Precompound 


< 0.001 


< 0.001 


< 0.001 


9.3 (G4MultipleScattering) 


Uelastic Bertini 


0.006 


< 0.001 


0.004 


9.2 (G4MultipleScattering) 


Uelastic LEP 


0.001 


< 0.001 


0.002 




Uelastic Precompound 


0.006 


< 0.001 


0.003 




Bertini Precompound 


< 0.001 


< 0.001 


0.001 




LEP Precompound 


0.009 


< 0.001 


0.005 


9.3 (G4MultipleScattering) 


Uelastic Bertini 


0.277 


0.051 


0.113 


9.1 (G4MultipleScattering) 


Uelastic LEP 


0.039 


0.011 


0.043 




Uelastic Precompound 


0.054 


0.012 


0.040 




Bertini Precompound 


0.039 


0.007 


0.028 




LEP Precompound 


0.130 


0.030 


0.071 


9.3 (G4MultipleScattering) 


Uelastic Bertini 


0.803 


0.505 


0.722 


8.1 (G4MultipleScattering) 


Uelastic LEP 


0.277 


0.119 


0.270 




Uelastic Precompound 


0.515 


0.232 


0.475 




Bertini Precompound 


0.219 


0.072 


0.179 




LEP Precompound 


0.426 


0.150 


0.297 


9.2 (G4MultipleScattering) 


Uelastic Bertini 


0.426 


0.135 


0.286 


9.1 (G4MultipleScattering) 


Uelastic LEP 


0.709 


0.281 


0.395 




Uelastic Precompound 


0.884 


0.418 


0.548 




Bertini Precompound 


0.709 


0.324 


0.478 




LEP Precompound 


0.426 


0.269 


0.516 


9.2 (G4hMultipleScattering) 


Uelastic Bertini 


< 0.001 


< 0.001 


< 0.001 


8.1 (G4MultipleScattering) 


Uelastic LEP 


< 0.001 


< 0.001 


< 0.001 




Uelastic Precompound 


< 0.001 


< 0.001 


< 0.001 




Bertini Precompound 


< 0.001 


< 0.001 


< 0.001 




LEP Precompound 


< 0.001 


< 0.001 


< 0.001 


9.1 (G4MultipleScattering) 


Uelastic Bertini 


0.020 


0.004 


0.016 


8.1 (G4MultipleScattering) 


Uelastic LEP 


0.001 


< 0.001 


0.001 




Uelastic Precompound 


0.003 


< 0.001 


0.003 




Bertini Precompound 


< 0.001 


< 0.001 


< 0.001 




LEP Precompound 


0.006 


< 0.001 


0.003 



systems, currently the object of active research, or radiation 
protection. The different secondary particle spectra deriving 
from the range of available hadronic options do not affect 
the main parameter of clinical interest, i.e. the depth dose 
distribution, but they are relevant to other aspects of radiation 
exposure. 

By far the largest effects of physics-related epistemic un- 
certainties in the simulation of proton depth dose are observed 
in relation to modeling multiple scattering in the beam line. 
However, even these effects are relevant only to use cases 
where the simulation is invested with predictive role regarding 
the absolute dose released to the target; otherwise, common 
practices, like the normalization of the simulated dose to a ref- 



erence value, would mask the epistemic uncertainty associated 
with the empirical parameters used to model this process. 

The analysis also highlights the importance of a knowledge 
of the whole simulation system regarding the effects visible in 
the sensitive volume. Interactions in the beam line affect the 
spectrum of the protons reaching the sensitive volume and the 
dose released to it; lack of knowledge of construction details of 
the beam line, or epistemic uncertainties in modeling particle 
interactions in the passive components of the system, are prone 
to bias the simulation outcome. 

The results documented in this paper about the different 
observables produced by Geant4 physics options identify 
some experimental requirements for the discrimination of their 
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features and their validation. Experimental measurements of 
adequate accuracy could reduce the epistemic uncertainties 
evidenced in the electromagnetic domain; relevant data could 
derive either from a thorough survey of the existing literature, 
or from new, dedicated measurements. In this respect, it 
is worthwhile to recall the valuable reference role for the 
validation of electron simulation played by the high precision 
measurements of [179] and [180], which were originally 
motivated by the validation of the ITS (Integrated Tiger Series) 
[181] Monte Carlo code; similar measurements concerning 
protons would be useful to reduce epistemic uncertainties. 

The sensitivity analysis documented in the previous sections 
also provides guidance to design meaningful test cases for 
inclusion in the test process of Monte Carlo systems. The 
identification of distributions which expose distinctive features 
of the physics models, as well as of others, which are prone to 
hide them, is especially useful to designing test cases relevant 
to monitoring the effects of changes in some critical parts of 
the code. 

The analysis presented in this paper is a first attempt at 
estimating quantitatively the impact of epistemic uncertain- 
ties on the considered use case; further refinements would 
contribute to better understanding the problem. So far, the 
analysis has considered each source of epistemic uncertainty 
individually; nevertheless, it would be worthwhile to evalu- 
ate their combinations, since several systematic contributions 
could accumulate their effects to bias the final simulation 
result. More refined treatments, e.g. based on the theory of 
evidence, could shed additional light on the problem; these 
methods would be especially useful if practical constraints 
hinder the availability of further experimental measurements 
to reduce the current uncertainties. 

The identification of the epistemic uncertainties embedded 
in a large-scale simulation code is far from trivial; design 
methods facilitating their identification at early stages of the 
software development, and their management in sensitivity 
analyses, would be beneficial. To the best of the authors' 
knowledge, this issue has not been studied yet in the context 
of Monte Carlo simulation; techniques like aspect oriented 
programming could provide useful paradigms to address it, 
and the inclusion of epistemic uncertainties in the traceability 
process, in the context of a rigorous software process dis- 
cipline, would effectively support their handling in complex 
software systems. 

Although this paper illustrates the problem of epistemic 
uncertainties in a specific simulation use case, the issue it 
addresses goes beyond the limited application domain con- 
sidered in this initial study. Regarding the simulation of 
low energy proton interactions, the epistemic uncertainties 
discussed in this paper and their effects are likely to affect 
other experimental domains as well: from the exposure of 
electronic components and astronauts to the space radiation 
environment, to the problem of radiation monitoring at particle 
accelerators. 

More generally, the issue of identifying and quantifying 
epistemic uncertainties, and their contribution to the overall 
reliability of simulation systems, permeates all Monte Carlo 
application domains. Monte Carlo simulation - not only for 



particle transport in detectors, but also for event generators - 
is expected to play a critical role in the physics analysis of 
LHC data, which involves energies not yet covered by any 
experimental measurements in controlled laboratory environ- 
ments; the development of sound methods and tools to deal 
with the epistemic uncertainties embedded in LHC simulation 
software appears a major task for the coming years in support 
of LHC physics results. 
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